[InterMine Dev] Local phytomine build

Joe Carlson jwcarlson at lbl.gov
Thu Apr 23 19:31:40 BST 2015

I should have added: I'm doing this load just to see if the postgres 
parameter miss-set was the cause of all my problems. I had not pulled in 
your changes on the analyzes. That will be the next experiment.


On 04/23/2015 11:24 AM, Joe Carlson wrote:
> Hi Richard,
> I'm redoing some of the genome loading with the proper settings for 
> postgres on our hardware. In the last run I saw that the memory 
> settings were simply wrong and was worried that was causing the drop off.
> I'm not totally done; only the first half of so. Loading performance 
> is slightly better, but I still am seeing the drop off. (Red is the 
> old rate; blue is the new.)
> Do you have autovacuum on? or do you manually vacuum?
> I do not manually vacuum and have autovacuum on. I've asked that the 
> postgres config be set so that the autovacuums are logged so I don't 
> know if and when they are run. I'm not totally sure how postgres does 
> this, but do the analyze's tend to accumulate dead rows in the 
> indexes? I've noticed that when doing a manual vacuum after a data 
> loading step, there are no dead rows in the table, but there are dead 
> rows in the indexes that are removed.
> Joe
> On 04/22/2015 09:49 AM, Richard Smith wrote:
>> Hi Joe,
>> I made a change to make ANALYSEs less frequent and only execute on 
>> tables
>> for which a primary key is defined for the loading source. In a 
>> re-run of
>> the phytomine build this seems to help the degradation but not fix it
>> completely.
>> The build isn't quite finished but has completed 40 out of the 45 
>> genomes
>> so far in about 24 hours.  Graphs of the old and new builds are 
>> attached.
>> The change I made is here:
>> https://github.com/intermine/intermine/pull/985
>> I think some ANALYSEs may need to be re-instated where deletes are 
>> run on
>> indirection tables. I'll look into it some more.
>> Cheers,
>> Richard.
>>> Hi Richard,
>>> This is interesting. The slower rate of degradation looks promising. 
>>> This
>>> is probably a combination of your better hardware and the fact that 
>>> I had
>>> some postgres parameters improperly set for our last load. I hope to 
>>> redo
>>> it in the next week or so to see if getting those parameters right by
>>> itself help out.
>>> I’m also thinking about some strategies for a more brute force data
>>> loading. If I get something running, I’ll let you know. We’re 
>>> getting to
>>> the position that we’re really going to have to get something much, 
>>> much
>>> faster.
>>> Joe
>>>> On Apr 15, 2015, at 9:18 AM, Richard Smith <richard at flymine.org> 
>>>> wrote:
>>>> Hi Joe,
>>>> I started building phytomine from the chado dump you provided on 
>>>> one of
>>>> our servers. It's only loaded 18 genomes so far (some failed, I 
>>>> skipped
>>>> them) but it does show that it started off faster than your load but
>>>> also
>>>> seems to be slowing, but perhaps less rapidly.
>>>> You can see the progress in the attached graph - in an effort to 
>>>> one-up
>>>> your graph I made the marker sizes proportional to the number of 
>>>> objects
>>>> loaded.
>>>> The server it's running on isn't our newest (around 4 year old 16 core
>>>> 2.67GHz Xeon, 100GB RAM) but it does have a fast direct attached disk
>>>> array. The build java process was set to use 10GB RAM in ANT_OPTS.
>>>> I still think the slowdown is caused by ANALYSEs running on the target
>>>> database, once this has got a bit further I'll restart with the 
>>>> change I
>>>> made to make them less frequent.
>>>> Cheers,
>>>> Richard.
>>>> <phytomine-met1.png>

More information about the dev mailing list