[InterMine Dev] Integration of huge amount data

Richard Smith richard at flymine.org
Mon Feb 13 15:48:24 GMT 2012


Hi Chen,
Those numbers should work fine, metabolicMine contains 120m objects but
took 40 hours for the last build (including post-processing).  I'm not
sure why there was a big slow-down since the previous release.

We have a demo Arabidopsis variation database that loaded over 300m
objects in 24 hours.

There can be many reasons for the load to run slowly, if you still have
the intermine.log from the integrate directory where you ran the load
that would be a big help.

It's possible that without enough RAM the process was swapping.  Also
the order in which items are stored has effect on on speed, see:

	http://intermine.org/wiki/DataLoadingPerformance

It also depends on exactly what is being stored (i.e. objects with with
large strings take longer) and how the data is modeled.  Maybe you could
send the additions file you used as well?

Thanks,
Richard.



On 13/02/2012 09:29, Chen Yian wrote:
> Hi all,
>
> Dose anyone have any experience about integrating large quantity data
> into InterMine system?
> I've tried to incorporate gene3d domain assignment, which contains 16m
> assignments for 15m proteins.
> I think I might have created more than 30m objects during integration.
> When storing these items, it took very long time and I couldn't finish
> it within 15 hours (I gave up finally).
> Any suggestion?
>
> Thank you.
>
> Best,
>
> Chen
>
>
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>




More information about the dev mailing list