[InterMine Dev] Integration of huge amount data
chenyian at nibio.go.jp
Tue Feb 14 04:43:22 GMT 2012
Thank your for the reply.
I realized that I've never integrated items which is more than 5m, and I
should better have good patient.
After running ant clean, my log file has gone, but I'll try it again.
About the storing order,
I think I do follow the rule about storing objects before any other
objects that reference them.
But I guess I didn't have enough RAM (16GB currently).
My model is simple,
<class name="Protein" is-interface="true">
<class name="ProteinRegion" is-interface="true">
<attribute name="start" type="java.lang.Integer"/>
<attribute name="end" type="java.lang.Integer"/>
<class name="StructuralDomainRegion" extends="ProteinRegion"
<reference name="protein" referenced-type="Protein"
<collection name="dataSets" referenced-type="DataSet" />
<class name="CathClassification" is-interface="true">
<attribute name="cathCode" type="java.lang.String"/>
By the way, we are going to have a new machine which contains better CPU
and larger RAM,
hope it could also improve the building performance.
(2012/02/14 5:17), dev-request at intermine.org wrote:
> ate: Mon, 13 Feb 2012 15:48:24 +0000
> From: Richard Smith<richard at flymine.org>
> To: dev at intermine.org
> Subject: Re: [InterMine Dev] Integration of huge amount data
> Hi Chen,
> Those numbers should work fine, metabolicMine contains 120m objects but
> took 40 hours for the last build (including post-processing). I'm not
> sure why there was a big slow-down since the previous release.
> We have a demo Arabidopsis variation database that loaded over 300m
> objects in 24 hours.
> There can be many reasons for the load to run slowly, if you still have
> the intermine.log from the integrate directory where you ran the load
> that would be a big help.
> It's possible that without enough RAM the process was swapping. Also
> the order in which items are stored has effect on on speed, see:
> It also depends on exactly what is being stored (i.e. objects with with
> large strings take longer) and how the data is modeled. Maybe you could
> send the additions file you used as well?
> On 13/02/2012 09:29, Chen Yian wrote:
>> Hi all,
>> Dose anyone have any experience about integrating large quantity data
>> into InterMine system?
>> I've tried to incorporate gene3d domain assignment, which contains 16m
>> assignments for 15m proteins.
>> I think I might have created more than 30m objects during integration.
>> When storing these items, it took very long time and I couldn't finish
>> it within 15 hours (I gave up finally).
>> Any suggestion?
>> Thank you.
>> dev mailing list
>> dev at intermine.org
> dev mailing list
> dev at intermine.org
More information about the dev