[InterMine Dev] Is it useful to use RAMDirectory for indexing lucene keywords?

Colin colin.diesh at gmail.com
Mon Nov 21 17:35:59 GMT 2016


Thanks for the comments Justin. I also think the solr/elasticsearch is
still interesting and my branch has a little demo of using solr.

With the existing code with lucene,  I am not sure that it makes since to
use RAMDirectory during loading/postprocessing but I think trying to figure
out the "batch size" for committing the index to disk might be important.
http://stackoverflow.com/questions/11469131/batch-commit-for-lucene-index

Not sure if that is already optimized or not!

-Colin

On Mon, Nov 21, 2016 at 8:26 AM, Justin Clark-Casey <justincc at intermine.org>
wrote:

> Hi Hongkee,
>
> I believe (though I have not rigorously tested), that InterMine's Lucene
> indexing is CPU bound rather than IO bound.  Therefore, I don't expect that
> using a RAMDirectory would help much, though I'd be very interested in
> seeing numbers if you do try it.
>
> One could maybe more productively tackle the CPU bound by doing indexing
> work over multiple cores.  At the moment, as you can see from
> KeywordSearch.createIndex(), the indexing is currently done on a single
> thread via InterMineObjectFetcher.  One could have 8 fetchers instead, for
> instance, though more significant code change is probably required to split
> all the indexable InterMine objects into 8 workloads.
>
> But in any case, I should tell you that we're currently looking at
> updating the search approach, quite possibly by moving to Elasticsearch or
> Solr (currently leaning towards Elasticsearch).  So indexing may be carried
> out differently and I wouldn't want you to waste time on an approach
> (embedded Lucene) that may go away.  That said, we still need to consider
> how to keep providing a good out-of-the-box search experience.
>
> You can see some work by Colin Diesh that gets InterMine working with Solr
> instead of embedded Lucene here [1].
>
> [1] https://github.com/intermine/intermine/issues/517
>
> --
> Justin Clark-Casey, Synbiomine/InterMine Developer
> http://synbiomine.org
> http://twitter.com/justincc
>
>
> On 18/11/16 11:12, HongKee Moon wrote:
>
>> Hi all,
>>
>> I am quite curios about RAMDirectory for indexing lucene keywords because
>> normally “postprocess” takes quite long time.
>> Do you guys think RAMDirectory would be better/faster option to doing
>> “postprocess” task?
>>
>> Supposedly, it must be faster to write/gunzip after restoring indexed
>> files from the database after the webapp starts with RAMDirectoy.
>> Could you share your experience of using RAMDirectory instead of
>> FSDirectly if you are currently using it for improving performance of
>> intermine tasks?
>>
>> Cheers,
>> HongKee
>>
>> --
>> HongKee Moon
>> Software Engineer
>> Scientific Computing Facility
>>
>> Max Planck Institute of Molecular Cell Biology and Genetics
>> Pfotenhauerstr. 108
>> 01307 Dresden
>> Germany
>>
>> fon: +49 351 210 2740
>> fax: +49 351 210 1689
>> www.mpi-cbg.de <http://www.mpi-cbg.de>
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at lists.intermine.org
>> https://lists.intermine.org/mailman/listinfo/dev
>>
>> _______________________________________________
> dev mailing list
> dev at lists.intermine.org
> https://lists.intermine.org/mailman/listinfo/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.intermine.org/pipermail/dev/attachments/20161121/f44f49b5/attachment.html>


More information about the dev mailing list