[InterMine Dev] Local phytomine build

Joe Carlson jwcarlson at lbl.gov
Fri May 1 19:48:35 BST 2015


Hi Richard,

Yes, I had incorporated your code changes. It seems to help a bit on the object insertion rate overall, but there is still the drop-off as I load more genomes and it is more than what you had seem.

(blue is the original, red after getting my postgres parameters right, green is with an explicit vacuum between loads and your code changes.

The peaks and valleys don’t quite match up since genome #10 of the different loads may be different organisms. But I’ve convinced myself that we’re better off in the latest insertion - especially at the end. I can’t complain too much since we’re seeing more than a 2X improvement at the end. But it is still a bit of a mystery.

Joe




On May 1, 2015, at 1:41 AM, Richard Smith <richard at flymine.org> wrote:

> Did you include the analyse changes in the latest build? I think that is a
> clear way to gain some improvement.
> 
> Richard.
> 
> 
>> Yeah. I was surprised myself. I tried a third pass in which I explicitly
>> did a vacuum and analyze between genome insertions and that did not help.
>> 
>> I had also noticed that the tracker table was not getting cleaned out
>> with a build-db if I kept the settings pointed to the same database
>> name. I made sure that was empty before I started.
>> 
>> No improvement between the second and third load.
>> 
>> But this is not really my biggest area of concern. I've started another
>> build from scratch. The thing that has been bothering me are all the
>> queries to see if the objects need to be merged. Your hint about putting
>> something in the id map to prevent the queries has me thinking that
>> itself will make all the difference.
>> 
>> Joe
>> 
>> On 04/30/2015 08:19 AM, Richard Smith wrote:
>>> Hi Joe,
>>> I'm very surprised the speed increase isn't larger with better Postgres
>>> settings. Maybe there is more to tweak or, as we discussed before, you
>>> may
>>> be limited by disk or network.
>>> 
>>> We have autovacuum switched off, I think so it doesn't interfere with
>>> builds at the wrong time.  For FlyMine we dump the database after
>>> building
>>> and reload it on a different server which I think removes the need for
>>> vacuuming. I haven't ever looked into whether we have dead table or
>>> index
>>> entries.
>>> 
>>> All the best,
>>> Richard.
>>> 
>>> 
>>> 
>>>> Hi Richard,
>>>> 
>>>> I'm redoing some of the genome loading with the proper settings for
>>>> postgres on our hardware. In the last run I saw that the memory
>>>> settings
>>>> were simply wrong and was worried that was causing the drop off.
>>>> 
>>>> I'm not totally done; only the first half of so. Loading performance is
>>>> slightly better, but I still am seeing the drop off. (Red is the old
>>>> rate; blue is the new.)
>>>> 
>>>> Do you have autovacuum on? or do you manually vacuum?
>>>> 
>>>> I do not manually vacuum and have autovacuum on. I've asked that the
>>>> postgres config be set so that the autovacuums are logged so I don't
>>>> know if and when they are run. I'm not totally sure how postgres does
>>>> this, but do the analyze's tend to accumulate dead rows in the indexes?
>>>> I've noticed that when doing a manual vacuum after a data loading step,
>>>> there are no dead rows in the table, but there are dead rows in the
>>>> indexes that are removed.
>>>> 
>>>> Joe
>>>> 
>>>> 
>>>> On 04/22/2015 09:49 AM, Richard Smith wrote:
>>>>> Hi Joe,
>>>>> I made a change to make ANALYSEs less frequent and only execute on
>>>>> tables
>>>>> for which a primary key is defined for the loading source. In a re-run
>>>>> of
>>>>> the phytomine build this seems to help the degradation but not fix it
>>>>> completely.
>>>>> 
>>>>> The build isn't quite finished but has completed 40 out of the 45
>>>>> genomes
>>>>> so far in about 24 hours.  Graphs of the old and new builds are
>>>>> attached.
>>>>> The change I made is here:
>>>>> 
>>>>> https://github.com/intermine/intermine/pull/985
>>>>> 
>>>>> I think some ANALYSEs may need to be re-instated where deletes are run
>>>>> on
>>>>> indirection tables. I'll look into it some more.
>>>>> 
>>>>> Cheers,
>>>>> Richard.
>>>>> 
>>>>> 
>>>>>> Hi Richard,
>>>>>> 
>>>>>> This is interesting. The slower rate of degradation looks promising.
>>>>>> This
>>>>>> is probably a combination of your better hardware and the fact that I
>>>>>> had
>>>>>> some postgres parameters improperly set for our last load. I hope to
>>>>>> redo
>>>>>> it in the next week or so to see if getting those parameters right by
>>>>>> itself help out.
>>>>>> 
>>>>>> I’m also thinking about some strategies for a more brute force data
>>>>>> loading. If I get something running, I’ll let you know. We’re getting
>>>>>> to
>>>>>> the position that we’re really going to have to get something much,
>>>>>> much
>>>>>> faster.
>>>>>> 
>>>>>> Joe
>>>>>> 
>>>>>> 
>>>>>>> On Apr 15, 2015, at 9:18 AM, Richard Smith <richard at flymine.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Joe,
>>>>>>> I started building phytomine from the chado dump you provided on one
>>>>>>> of
>>>>>>> our servers. It's only loaded 18 genomes so far (some failed, I
>>>>>>> skipped
>>>>>>> them) but it does show that it started off faster than your load but
>>>>>>> also
>>>>>>> seems to be slowing, but perhaps less rapidly.
>>>>>>> 
>>>>>>> You can see the progress in the attached graph - in an effort to
>>>>>>> one-up
>>>>>>> your graph I made the marker sizes proportional to the number of
>>>>>>> objects
>>>>>>> loaded.
>>>>>>> 
>>>>>>> The server it's running on isn't our newest (around 4 year old 16
>>>>>>> core
>>>>>>> 2.67GHz Xeon, 100GB RAM) but it does have a fast direct attached
>>>>>>> disk
>>>>>>> array. The build java process was set to use 10GB RAM in ANT_OPTS.
>>>>>>> 
>>>>>>> I still think the slowdown is caused by ANALYSEs running on the
>>>>>>> target
>>>>>>> database, once this has got a bit further I'll restart with the
>>>>>>> change
>>>>>>> I
>>>>>>> made to make them less frequent.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Richard.
>>>>>>> 
>>>>>>> 
>>>>>>> <phytomine-met1.png>
>>>>>> 
>>>> 
>> 
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20150501/1e454f1f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LoadRate.png
Type: image/png
Size: 54182 bytes
Desc: not available
URL: <http://mail.intermine.org/pipermail/dev/attachments/20150501/1e454f1f/attachment-0001.png>


More information about the dev mailing list