[InterMine Dev] Is it useful to use RAMDirectory for indexing lucene keywords?

Colin colin.diesh at gmail.com
Mon Nov 21 17:35:59 GMT 2016

Thanks for the comments Justin. I also think the solr/elasticsearch is
still interesting and my branch has a little demo of using solr.

With the existing code with lucene,  I am not sure that it makes since to
use RAMDirectory during loading/postprocessing but I think trying to figure
out the "batch size" for committing the index to disk might be important.

Not sure if that is already optimized or not!


On Mon, Nov 21, 2016 at 8:26 AM, Justin Clark-Casey <justincc at intermine.org>

> Hi Hongkee,
> I believe (though I have not rigorously tested), that InterMine's Lucene
> indexing is CPU bound rather than IO bound.  Therefore, I don't expect that
> using a RAMDirectory would help much, though I'd be very interested in
> seeing numbers if you do try it.
> One could maybe more productively tackle the CPU bound by doing indexing
> work over multiple cores.  At the moment, as you can see from
> KeywordSearch.createIndex(), the indexing is currently done on a single
> thread via InterMineObjectFetcher.  One could have 8 fetchers instead, for
> instance, though more significant code change is probably required to split
> all the indexable InterMine objects into 8 workloads.
> But in any case, I should tell you that we're currently looking at
> updating the search approach, quite possibly by moving to Elasticsearch or
> Solr (currently leaning towards Elasticsearch).  So indexing may be carried
> out differently and I wouldn't want you to waste time on an approach
> (embedded Lucene) that may go away.  That said, we still need to consider
> how to keep providing a good out-of-the-box search experience.
> You can see some work by Colin Diesh that gets InterMine working with Solr
> instead of embedded Lucene here [1].
> [1] https://github.com/intermine/intermine/issues/517
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc
> On 18/11/16 11:12, HongKee Moon wrote:
>> Hi all,
>> I am quite curios about RAMDirectory for indexing lucene keywords because
>> normally “postprocess” takes quite long time.
>> Do you guys think RAMDirectory would be better/faster option to doing
>> “postprocess” task?
>> Supposedly, it must be faster to write/gunzip after restoring indexed
>> files from the database after the webapp starts with RAMDirectoy.
>> Could you share your experience of using RAMDirectory instead of
>> FSDirectly if you are currently using it for improving performance of
>> intermine tasks?
>> Cheers,
>> HongKee
>> --
>> HongKee Moon
>> Software Engineer
>> Scientific Computing Facility
>> Max Planck Institute of Molecular Cell Biology and Genetics
>> Pfotenhauerstr. 108
>> 01307 Dresden
>> Germany
>> fon: +49 351 210 2740
>> fax: +49 351 210 1689
>> www.mpi-cbg.de <http://www.mpi-cbg.de>
>> _______________________________________________
>> dev mailing list
>> dev at lists.intermine.org
>> https://lists.intermine.org/mailman/listinfo/dev
>> _______________________________________________
> dev mailing list
> dev at lists.intermine.org
> https://lists.intermine.org/mailman/listinfo/dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.intermine.org/pipermail/dev/attachments/20161121/f44f49b5/attachment.html>

More information about the dev mailing list