[InterMine Dev] reducing source loading times

JD Wong jdmswong at gmail.com
Mon Feb 13 20:16:50 GMT 2012


The GO source has been running for 40 minutes now and is using 20GB ram
with 8% CPU.

-JD

On Mon, Feb 13, 2012 at 11:30 AM, Richard Smith <richard at flymine.org> wrote:

> On 10/02/2012 17:37, Benjamin Hitz wrote:
>
>>
>> Not that I have ever loaded an intermine, but... it sort of sounds like
>> you guys are not all using the same GO files.
>> there are a few versions of the .obo file (at least one of which is
>> "reasoning enabled" - which might not be what you want).
>>
>
> Good point.  It looks like we're still fetching the 1.0 format file:
>
> http://geneontology.org/**ontology/obo_format_1_0/gene_**ontology.1_0.obo<http://geneontology.org/ontology/obo_format_1_0/gene_ontology.1_0.obo>
>
> I guess we need to update the parser and test it with 1.2 but in the
> meantime could JD and Thomas try the 1.0 version to see what the load
> time is like.
>
> Thanks,
> Richard.
>
>
>  there is one HUGE gene_association file (gene_association.uniprot) which
>> is something like 12M lines. Takes some time to chew through that so be
>> sure you want it.
>>
>> Ben
>>
>>
>> On Feb 10, 2012, at 8:48 AM, Thomas TRIPLET wrote:
>>
>>  I have the same issue, loading GO is extremely slow (on v0.97), and
>>> haven't found any solution yet =/
>>> I you find any, please let us know.
>>> Thanks
>>> Thomas
>>>
>>>
>>> Thomas Triplet, Ph.D.
>>> http://www.thomastriplet.net <http://www.thomastriplet.net/**>
>>>
>>>
>>> Centre for Structural and Functional Genomics
>>> Concordia University
>>> 7141 West Sherbrooke St
>>> Montreal QC H4B 1R6
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 10, 2012 at 10:55 AM, JD Wong <jdmswong at gmail.com
>>> <mailto:jdmswong at gmail.com>> wrote:
>>>
>>>    I'll update this thread when a solution is found
>>>
>>>
>>>    On Fri, Feb 10, 2012 at 10:54 AM, JD Wong <jdmswong at gmail.com
>>>    <mailto:jdmswong at gmail.com>> wrote:
>>>
>>>        In other words I set ANT_OPTS="... -Xmx 20000m in my .bashrc
>>>        file. 20G should be a good amount, and since these values
>>>        transfer to the java calls that ant makes this is a strange
>>>        problem indeed...
>>>
>>>        -JD
>>>
>>>
>>>        On Wed, Feb 8, 2012 at 4:04 PM, JD Wong <jdmswong at gmail.com
>>>        <mailto:jdmswong at gmail.com>> wrote:
>>>
>>>            ANT_OPTS has 20GB allocated to it
>>>
>>>
>>>            On Wed, Feb 8, 2012 at 1:07 PM, Richard Smith
>>>            <richard at flymine.org <mailto:richard at flymine.org>> wrote:
>>>
>>>                Hi JD,
>>>                It looks like it's the OBO edit reasoner that is
>>>                taking all the time:
>>>
>>>                2012-02-07 11:06:32 INFO
>>>                org.obo.reasoner.impl.__**LinkPileReasoner - Total
>>>
>>>                reasoner time = 2130574.717897 ms
>>>
>>>                Which is 35 minutes. On the latest FlyMine build it
>>>                took 30 seconds.
>>>                I guess this is a RAM thing. For the FlyMine build we
>>>                had 32GB heap
>>>                allocated to the Java process. How much did you have?
>>>
>>>                The rest of the build looks like it ran fast, about 1
>>>                million objects
>>>                loaded in five minutes which is good.
>>>
>>>
>>>                Cheers,
>>>                Richard.
>>>
>>>
>>>
>>>
>>>                On 07/02/2012 16:30, JD Wong wrote:
>>>
>>>                    Sure
>>>
>>>                    On Tue, Feb 7, 2012 at 8:34 AM, Richard Smith
>>>                    <richard at flymine.org <mailto:richard at flymine.org>
>>>                    <mailto:richard at flymine.org
>>>
>>>                    <mailto:richard at flymine.org>>> wrote:
>>>
>>>                    JD,
>>>                    Could you send us the intermine.log from your
>>>                    integrate directory after
>>>                    running a build. This is the most helpful thing
>>>                    for us to investigate
>>>                    performance.
>>>
>>>                    Thanks,
>>>                    Richard.
>>>
>>>
>>>
>>>
>>>
>>>                    On 06/02/2012 19:01, JD Wong wrote:
>>>
>>>                    I was wondering how the other mods speed up their
>>>                    builds. I have
>>>                    configured ant, java, and postgres accordingly
>>>                    without effect.
>>>                    I was
>>>                    hoping to get the community's advice on this aspect.
>>>
>>>                    Cheers,
>>>                    -JD
>>>
>>>                    On Thu, Feb 2, 2012 at 10:11 AM, JD Wong
>>>                    <jdmswong at gmail.com <mailto:jdmswong at gmail.com>
>>>                    <mailto:jdmswong at gmail.com
>>>                    <mailto:jdmswong at gmail.com>>
>>>                    <mailto:jdmswong at gmail.com
>>>                    <mailto:jdmswong at gmail.com>
>>>                    <mailto:jdmswong at gmail.com
>>>                    <mailto:jdmswong at gmail.com>>>> wrote:
>>>
>>>                    I haven't given ant and postgres enough to consume
>>>                    all the
>>>                    memory
>>>                    when running simultaneously. Also there is plenty
>>>                    of free
>>>                    memory
>>>                    during loading.
>>>
>>>                    -JD
>>>
>>>
>>>                    On Wed, Feb 1, 2012 at 10:35 PM, Josh Goodman
>>>                    <jogoodma at indiana.edu
>>>                    <mailto:jogoodma at indiana.edu>
>>>                    <mailto:jogoodma at indiana.edu
>>>                    <mailto:jogoodma at indiana.edu>>
>>>                    <mailto:jogoodma at indiana.edu
>>>                    <mailto:jogoodma at indiana.edu>
>>>                    <mailto:jogoodma at indiana.edu
>>>                    <mailto:jogoodma at indiana.edu>>**>__> wrote:
>>>
>>>                    You need to be careful of the various memory settings
>>>                    here. If you
>>>                    set ant really high (>25% of total memory) and you
>>>                    are also
>>>                    setting Pg
>>>                    high you could be suffering from the two of them
>>>                    fighting over
>>>                    system
>>>                    resources and causing the swap to get thrashed. I
>>>                    would
>>>                    run the
>>>                    unix
>>>                    "free" command while you are running a load to see
>>>                    what is going on
>>>                    with memory.
>>>
>>>                    e.g.
>>>
>>>                    free -m -s 5
>>>
>>>                    If you have other processes running on this
>>>                    machine (tomcat
>>>                    instances)
>>>                    you also need to adjust ant and Pg to take that into
>>>                    account.
>>>
>>>                    Josh
>>>
>>>                    On Wed, Feb 1, 2012 at 5:20 PM, JD Wong
>>>                    <jdmswong at gmail.com <mailto:jdmswong at gmail.com>
>>>                    <mailto:jdmswong at gmail.com
>>>                    <mailto:jdmswong at gmail.com>>
>>>                    <mailto:jdmswong at gmail.com
>>>                    <mailto:jdmswong at gmail.com>
>>>                    <mailto:jdmswong at gmail.com
>>>                    <mailto:jdmswong at gmail.com>>>> wrote:
>>>                    > Hi all,
>>>                    > Loading my Go source takes on average 2500
>>>                    seconds. I have
>>>                    tuned the
>>>                    > postgres configuration paramaters to the desired
>>>                    values and
>>>                    gave ant high
>>>                    > heap memory to no avail. Is there a way to speed
>>>                    this up?
>>>                    >
>>>                    > -JD
>>>                    >
>>>                    > ______________________________**
>>> _____________________
>>>
>>>
>>>                    > dev mailing list
>>>                    > dev at intermine.org <mailto:dev at intermine.org>
>>>                    <mailto:dev at intermine.org <mailto:dev at intermine.org>>
>>>                    <mailto:dev at intermine.org
>>>                    <mailto:dev at intermine.org>
>>>                    <mailto:dev at intermine.org <mailto:dev at intermine.org
>>> >>>
>>>                    >
>>>                    http://mail.intermine.org/cgi-**
>>> ____bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-____bin/mailman/listinfo/dev>
>>>                    <http://mail.intermine.org/**
>>> cgi-__bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-__bin/mailman/listinfo/dev>
>>> **>
>>>                    <http://mail.intermine.org/__**
>>> cgi-bin/mailman/listinfo/dev<http://mail.intermine.org/__cgi-bin/mailman/listinfo/dev>
>>>                    <http://mail.intermine.org/**
>>> cgi-bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>> >>
>>>
>>>                    >
>>>
>>>
>>>
>>>
>>>
>>>                    ______________________________**_____________________
>>>
>>>                    dev mailing list
>>>                    dev at intermine.org <mailto:dev at intermine.org>
>>>                    <mailto:dev at intermine.org <mailto:dev at intermine.org>>
>>>                    http://mail.intermine.org/cgi-**
>>> ____bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-____bin/mailman/listinfo/dev>
>>>                    <http://mail.intermine.org/**
>>> cgi-__bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-__bin/mailman/listinfo/dev>
>>> **>
>>>                    <http://mail.intermine.org/__**
>>> cgi-bin/mailman/listinfo/dev<http://mail.intermine.org/__cgi-bin/mailman/listinfo/dev>
>>>                    <http://mail.intermine.org/**
>>> cgi-bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>> >>
>>>
>>>
>>>
>>>                    ______________________________**_____________________
>>>
>>>                    dev mailing list
>>>                    dev at intermine.org <mailto:dev at intermine.org>
>>>                    <mailto:dev at intermine.org <mailto:dev at intermine.org>>
>>>                    http://mail.intermine.org/cgi-**
>>> ____bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-____bin/mailman/listinfo/dev>
>>>                    <http://mail.intermine.org/**
>>> cgi-__bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-__bin/mailman/listinfo/dev>
>>> **>
>>>                    <http://mail.intermine.org/__**
>>> cgi-bin/mailman/listinfo/dev<http://mail.intermine.org/__cgi-bin/mailman/listinfo/dev>
>>>                    <http://mail.intermine.org/**
>>> cgi-bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>    ______________________________**_________________
>>>    dev mailing list
>>>    dev at intermine.org <mailto:dev at intermine.org>
>>>
>>>    http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>>
>>>
>>> ______________________________**_________________
>>> dev mailing list
>>> dev at intermine.org <mailto:dev at intermine.org>
>>> http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>>
>>
>> --
>> Ben Hitz
>> Senior Scientific Programmer ** Saccharomyces Genome Database ** GO
>> Consortium
>> Stanford University ** hitz at stanford.edu <mailto:hitz at stanford.edu>
>>
>>
>>
>>
>>
>>
>>
>> ______________________________**_________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>
>
>
> ______________________________**_________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20120213/63c5ad0f/attachment-0001.html>


More information about the dev mailing list