[InterMine Dev] Local phytomine build

Joe Carlson jwcarlson at lbl.gov
Thu Apr 30 16:58:06 BST 2015

Yeah. I was surprised myself. I tried a third pass in which I explicitly 
did a vacuum and analyze between genome insertions and that did not help.

I had also noticed that the tracker table was not getting cleaned out 
with a build-db if I kept the settings pointed to the same database 
name. I made sure that was empty before I started.

No improvement between the second and third load.

But this is not really my biggest area of concern. I've started another 
build from scratch. The thing that has been bothering me are all the 
queries to see if the objects need to be merged. Your hint about putting 
something in the id map to prevent the queries has me thinking that 
itself will make all the difference.


On 04/30/2015 08:19 AM, Richard Smith wrote:
> Hi Joe,
> I'm very surprised the speed increase isn't larger with better Postgres
> settings. Maybe there is more to tweak or, as we discussed before, you may
> be limited by disk or network.
> We have autovacuum switched off, I think so it doesn't interfere with
> builds at the wrong time.  For FlyMine we dump the database after building
> and reload it on a different server which I think removes the need for
> vacuuming. I haven't ever looked into whether we have dead table or index
> entries.
> All the best,
> Richard.
>> Hi Richard,
>> I'm redoing some of the genome loading with the proper settings for
>> postgres on our hardware. In the last run I saw that the memory settings
>> were simply wrong and was worried that was causing the drop off.
>> I'm not totally done; only the first half of so. Loading performance is
>> slightly better, but I still am seeing the drop off. (Red is the old
>> rate; blue is the new.)
>> Do you have autovacuum on? or do you manually vacuum?
>> I do not manually vacuum and have autovacuum on. I've asked that the
>> postgres config be set so that the autovacuums are logged so I don't
>> know if and when they are run. I'm not totally sure how postgres does
>> this, but do the analyze's tend to accumulate dead rows in the indexes?
>> I've noticed that when doing a manual vacuum after a data loading step,
>> there are no dead rows in the table, but there are dead rows in the
>> indexes that are removed.
>> Joe
>> On 04/22/2015 09:49 AM, Richard Smith wrote:
>>> Hi Joe,
>>> I made a change to make ANALYSEs less frequent and only execute on
>>> tables
>>> for which a primary key is defined for the loading source. In a re-run
>>> of
>>> the phytomine build this seems to help the degradation but not fix it
>>> completely.
>>> The build isn't quite finished but has completed 40 out of the 45
>>> genomes
>>> so far in about 24 hours.  Graphs of the old and new builds are
>>> attached.
>>> The change I made is here:
>>> https://github.com/intermine/intermine/pull/985
>>> I think some ANALYSEs may need to be re-instated where deletes are run
>>> on
>>> indirection tables. I'll look into it some more.
>>> Cheers,
>>> Richard.
>>>> Hi Richard,
>>>> This is interesting. The slower rate of degradation looks promising.
>>>> This
>>>> is probably a combination of your better hardware and the fact that I
>>>> had
>>>> some postgres parameters improperly set for our last load. I hope to
>>>> redo
>>>> it in the next week or so to see if getting those parameters right by
>>>> itself help out.
>>>> I’m also thinking about some strategies for a more brute force data
>>>> loading. If I get something running, I’ll let you know. We’re getting
>>>> to
>>>> the position that we’re really going to have to get something much,
>>>> much
>>>> faster.
>>>> Joe
>>>>> On Apr 15, 2015, at 9:18 AM, Richard Smith <richard at flymine.org>
>>>>> wrote:
>>>>> Hi Joe,
>>>>> I started building phytomine from the chado dump you provided on one
>>>>> of
>>>>> our servers. It's only loaded 18 genomes so far (some failed, I
>>>>> skipped
>>>>> them) but it does show that it started off faster than your load but
>>>>> also
>>>>> seems to be slowing, but perhaps less rapidly.
>>>>> You can see the progress in the attached graph - in an effort to
>>>>> one-up
>>>>> your graph I made the marker sizes proportional to the number of
>>>>> objects
>>>>> loaded.
>>>>> The server it's running on isn't our newest (around 4 year old 16 core
>>>>> 2.67GHz Xeon, 100GB RAM) but it does have a fast direct attached disk
>>>>> array. The build java process was set to use 10GB RAM in ANT_OPTS.
>>>>> I still think the slowdown is caused by ANALYSEs running on the target
>>>>> database, once this has got a bit further I'll restart with the change
>>>>> I
>>>>> made to make them less frequent.
>>>>> Cheers,
>>>>> Richard.
>>>>> <phytomine-met1.png>

More information about the dev mailing list