[InterMine Dev] Local phytomine build

Joe Carlson jwcarlson at lbl.gov
Thu Apr 23 19:24:45 BST 2015


Hi Richard,

I'm redoing some of the genome loading with the proper settings for 
postgres on our hardware. In the last run I saw that the memory settings 
were simply wrong and was worried that was causing the drop off.

I'm not totally done; only the first half of so. Loading performance is 
slightly better, but I still am seeing the drop off. (Red is the old 
rate; blue is the new.)

Do you have autovacuum on? or do you manually vacuum?

I do not manually vacuum and have autovacuum on. I've asked that the 
postgres config be set so that the autovacuums are logged so I don't 
know if and when they are run. I'm not totally sure how postgres does 
this, but do the analyze's tend to accumulate dead rows in the indexes? 
I've noticed that when doing a manual vacuum after a data loading step, 
there are no dead rows in the table, but there are dead rows in the 
indexes that are removed.

Joe


On 04/22/2015 09:49 AM, Richard Smith wrote:
> Hi Joe,
> I made a change to make ANALYSEs less frequent and only execute on tables
> for which a primary key is defined for the loading source. In a re-run of
> the phytomine build this seems to help the degradation but not fix it
> completely.
>
> The build isn't quite finished but has completed 40 out of the 45 genomes
> so far in about 24 hours.  Graphs of the old and new builds are attached.
> The change I made is here:
>
> https://github.com/intermine/intermine/pull/985
>
> I think some ANALYSEs may need to be re-instated where deletes are run on
> indirection tables. I'll look into it some more.
>
> Cheers,
> Richard.
>
>
>> Hi Richard,
>>
>> This is interesting. The slower rate of degradation looks promising. This
>> is probably a combination of your better hardware and the fact that I had
>> some postgres parameters improperly set for our last load. I hope to redo
>> it in the next week or so to see if getting those parameters right by
>> itself help out.
>>
>> I’m also thinking about some strategies for a more brute force data
>> loading. If I get something running, I’ll let you know. We’re getting to
>> the position that we’re really going to have to get something much, much
>> faster.
>>
>> Joe
>>
>>
>>> On Apr 15, 2015, at 9:18 AM, Richard Smith <richard at flymine.org> wrote:
>>>
>>> Hi Joe,
>>> I started building phytomine from the chado dump you provided on one of
>>> our servers. It's only loaded 18 genomes so far (some failed, I skipped
>>> them) but it does show that it started off faster than your load but
>>> also
>>> seems to be slowing, but perhaps less rapidly.
>>>
>>> You can see the progress in the attached graph - in an effort to one-up
>>> your graph I made the marker sizes proportional to the number of objects
>>> loaded.
>>>
>>> The server it's running on isn't our newest (around 4 year old 16 core
>>> 2.67GHz Xeon, 100GB RAM) but it does have a fast direct attached disk
>>> array. The build java process was set to use 10GB RAM in ANT_OPTS.
>>>
>>> I still think the slowdown is caused by ANALYSEs running on the target
>>> database, once this has got a bit further I'll restart with the change I
>>> made to make them less frequent.
>>>
>>> Cheers,
>>> Richard.
>>>
>>>
>>> <phytomine-met1.png>
>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: newRate.png
Type: image/png
Size: 35164 bytes
Desc: not available
URL: <http://mail.intermine.org/pipermail/dev/attachments/20150423/398bdcbf/attachment-0001.png>


More information about the dev mailing list