[InterMine Dev] Local phytomine build

Richard Smith richard at flymine.org
Fri May 1 09:41:56 BST 2015

Did you include the analyse changes in the latest build? I think that is a
clear way to gain some improvement.


> Yeah. I was surprised myself. I tried a third pass in which I explicitly
> did a vacuum and analyze between genome insertions and that did not help.
> I had also noticed that the tracker table was not getting cleaned out
> with a build-db if I kept the settings pointed to the same database
> name. I made sure that was empty before I started.
> No improvement between the second and third load.
> But this is not really my biggest area of concern. I've started another
> build from scratch. The thing that has been bothering me are all the
> queries to see if the objects need to be merged. Your hint about putting
> something in the id map to prevent the queries has me thinking that
> itself will make all the difference.
> Joe
> On 04/30/2015 08:19 AM, Richard Smith wrote:
>> Hi Joe,
>> I'm very surprised the speed increase isn't larger with better Postgres
>> settings. Maybe there is more to tweak or, as we discussed before, you
>> may
>> be limited by disk or network.
>> We have autovacuum switched off, I think so it doesn't interfere with
>> builds at the wrong time.  For FlyMine we dump the database after
>> building
>> and reload it on a different server which I think removes the need for
>> vacuuming. I haven't ever looked into whether we have dead table or
>> index
>> entries.
>> All the best,
>> Richard.
>>> Hi Richard,
>>> I'm redoing some of the genome loading with the proper settings for
>>> postgres on our hardware. In the last run I saw that the memory
>>> settings
>>> were simply wrong and was worried that was causing the drop off.
>>> I'm not totally done; only the first half of so. Loading performance is
>>> slightly better, but I still am seeing the drop off. (Red is the old
>>> rate; blue is the new.)
>>> Do you have autovacuum on? or do you manually vacuum?
>>> I do not manually vacuum and have autovacuum on. I've asked that the
>>> postgres config be set so that the autovacuums are logged so I don't
>>> know if and when they are run. I'm not totally sure how postgres does
>>> this, but do the analyze's tend to accumulate dead rows in the indexes?
>>> I've noticed that when doing a manual vacuum after a data loading step,
>>> there are no dead rows in the table, but there are dead rows in the
>>> indexes that are removed.
>>> Joe
>>> On 04/22/2015 09:49 AM, Richard Smith wrote:
>>>> Hi Joe,
>>>> I made a change to make ANALYSEs less frequent and only execute on
>>>> tables
>>>> for which a primary key is defined for the loading source. In a re-run
>>>> of
>>>> the phytomine build this seems to help the degradation but not fix it
>>>> completely.
>>>> The build isn't quite finished but has completed 40 out of the 45
>>>> genomes
>>>> so far in about 24 hours.  Graphs of the old and new builds are
>>>> attached.
>>>> The change I made is here:
>>>> https://github.com/intermine/intermine/pull/985
>>>> I think some ANALYSEs may need to be re-instated where deletes are run
>>>> on
>>>> indirection tables. I'll look into it some more.
>>>> Cheers,
>>>> Richard.
>>>>> Hi Richard,
>>>>> This is interesting. The slower rate of degradation looks promising.
>>>>> This
>>>>> is probably a combination of your better hardware and the fact that I
>>>>> had
>>>>> some postgres parameters improperly set for our last load. I hope to
>>>>> redo
>>>>> it in the next week or so to see if getting those parameters right by
>>>>> itself help out.
>>>>> I’m also thinking about some strategies for a more brute force data
>>>>> loading. If I get something running, I’ll let you know. We’re getting
>>>>> to
>>>>> the position that we’re really going to have to get something much,
>>>>> much
>>>>> faster.
>>>>> Joe
>>>>>> On Apr 15, 2015, at 9:18 AM, Richard Smith <richard at flymine.org>
>>>>>> wrote:
>>>>>> Hi Joe,
>>>>>> I started building phytomine from the chado dump you provided on one
>>>>>> of
>>>>>> our servers. It's only loaded 18 genomes so far (some failed, I
>>>>>> skipped
>>>>>> them) but it does show that it started off faster than your load but
>>>>>> also
>>>>>> seems to be slowing, but perhaps less rapidly.
>>>>>> You can see the progress in the attached graph - in an effort to
>>>>>> one-up
>>>>>> your graph I made the marker sizes proportional to the number of
>>>>>> objects
>>>>>> loaded.
>>>>>> The server it's running on isn't our newest (around 4 year old 16
>>>>>> core
>>>>>> 2.67GHz Xeon, 100GB RAM) but it does have a fast direct attached
>>>>>> disk
>>>>>> array. The build java process was set to use 10GB RAM in ANT_OPTS.
>>>>>> I still think the slowdown is caused by ANALYSEs running on the
>>>>>> target
>>>>>> database, once this has got a bit further I'll restart with the
>>>>>> change
>>>>>> I
>>>>>> made to make them less frequent.
>>>>>> Cheers,
>>>>>> Richard.
>>>>>> <phytomine-met1.png>

More information about the dev mailing list