[InterMine Dev] Integration of huge amount data

Chen Yian chenyian at nibio.go.jp
Tue Feb 14 04:43:22 GMT 2012

Hi Richard,

Thank your for the reply.
I realized that I've never integrated items which is more than 5m, and I 
should better have good patient.

After running ant clean, my log file has gone, but I'll try it again.

About the storing order,
I think I do follow the rule about storing objects before any other 
objects that reference them.
But I guess I didn't have enough RAM (16GB currently).

My model is simple,

<class name="Protein" is-interface="true">
<collection name="structuralDomains" 
referenced-type="StructuralDomainRegion" reverse-reference="protein"/>

<class name="ProteinRegion" is-interface="true">
<attribute name="start" type="java.lang.Integer"/>
<attribute name="end" type="java.lang.Integer"/>

<class name="StructuralDomainRegion" extends="ProteinRegion" 
<reference name="protein" referenced-type="Protein"  
<reference name="cathClassification" 
referenced-type="CathClassification" ordered="true"/>
<collection name="dataSets" referenced-type="DataSet" />

<class name="CathClassification" is-interface="true">
<attribute name="cathCode" type="java.lang.String"/>


By the way, we are going to have a new machine which contains better CPU 
and larger RAM,
hope it could also improve the building performance.

Thanks again.

Best wishes,


(2012/02/14 5:17), dev-request at intermine.org wrote:
> ate: Mon, 13 Feb 2012 15:48:24 +0000
> From: Richard Smith<richard at flymine.org>
> To: dev at intermine.org
> Subject: Re: [InterMine Dev] Integration of huge amount data
> Hi Chen,
> Those numbers should work fine, metabolicMine contains 120m objects but
> took 40 hours for the last build (including post-processing).  I'm not
> sure why there was a big slow-down since the previous release.
> We have a demo Arabidopsis variation database that loaded over 300m
> objects in 24 hours.
> There can be many reasons for the load to run slowly, if you still have
> the intermine.log from the integrate directory where you ran the load
> that would be a big help.
> It's possible that without enough RAM the process was swapping.  Also
> the order in which items are stored has effect on on speed, see:
> 	http://intermine.org/wiki/DataLoadingPerformance
> It also depends on exactly what is being stored (i.e. objects with with
> large strings take longer) and how the data is modeled.  Maybe you could
> send the additions file you used as well?
> Thanks,
> Richard.
> On 13/02/2012 09:29, Chen Yian wrote:
>> Hi all,
>> Dose anyone have any experience about integrating large quantity data
>> into InterMine system?
>> I've tried to incorporate gene3d domain assignment, which contains 16m
>> assignments for 15m proteins.
>> I think I might have created more than 30m objects during integration.
>> When storing these items, it took very long time and I couldn't finish
>> it within 15 hours (I gave up finally).
>> Any suggestion?
>> Thank you.
>> Best,
>> Chen
>> _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

More information about the dev mailing list