[InterMine Dev] Is it useful to use RAMDirectory for indexing lucene keywords?

Justin Clark-Casey justincc at intermine.org
Mon Nov 21 14:26:45 GMT 2016

Hi Hongkee,

I believe (though I have not rigorously tested), that InterMine's Lucene indexing is CPU bound rather than IO bound.  Therefore, I don't expect that using a 
RAMDirectory would help much, though I'd be very interested in seeing numbers if you do try it.

One could maybe more productively tackle the CPU bound by doing indexing work over multiple cores.  At the moment, as you can see from 
KeywordSearch.createIndex(), the indexing is currently done on a single thread via InterMineObjectFetcher.  One could have 8 fetchers instead, for instance, 
though more significant code change is probably required to split all the indexable InterMine objects into 8 workloads.

But in any case, I should tell you that we're currently looking at updating the search approach, quite possibly by moving to Elasticsearch or Solr (currently 
leaning towards Elasticsearch).  So indexing may be carried out differently and I wouldn't want you to waste time on an approach (embedded Lucene) that may go 
away.  That said, we still need to consider how to keep providing a good out-of-the-box search experience.

You can see some work by Colin Diesh that gets InterMine working with Solr instead of embedded Lucene here [1].

[1] https://github.com/intermine/intermine/issues/517

Justin Clark-Casey, Synbiomine/InterMine Developer

On 18/11/16 11:12, HongKee Moon wrote:
> Hi all,
> I am quite curios about RAMDirectory for indexing lucene keywords because normally “postprocess” takes quite long time.
> Do you guys think RAMDirectory would be better/faster option to doing “postprocess” task?
> Supposedly, it must be faster to write/gunzip after restoring indexed files from the database after the webapp starts with RAMDirectoy.
> Could you share your experience of using RAMDirectory instead of FSDirectly if you are currently using it for improving performance of intermine tasks?
> Cheers,
> HongKee
> --
> HongKee Moon
> Software Engineer
> Scientific Computing Facility
> Max Planck Institute of Molecular Cell Biology and Genetics
> Pfotenhauerstr. 108
> 01307 Dresden
> Germany
> fon: +49 351 210 2740
> fax: +49 351 210 1689
> www.mpi-cbg.de <http://www.mpi-cbg.de>
> _______________________________________________
> dev mailing list
> dev at lists.intermine.org
> https://lists.intermine.org/mailman/listinfo/dev

More information about the dev mailing list