[InterMine Dev] setting residues in the direct loader

Joe Carlson jwcarlson at lbl.gov
Fri Apr 17 16:24:21 BST 2015


Thanks. That’s what I was looking for.

I’m using this for loading the protein families. Each family is a collection of protein along with a clustalw alignment and HMM and some other metadata. I saw that I was spending a large amount of time resolving references to the genes and proteins that each family references. I was hoping that by prefetching all those references I could speed this up..

I’ll let you know how it goes.

Joe

> On Apr 17, 2015, at 2:05 AM, Richard Smith <richard at flymine.org> wrote:
> 
> Hi Joe,
> I would take a look at FastaLoaderTask.java which does the same thing. It
> looks like it creates a PendingClob with a string of the sequence:
> 
> flymineSequence.setResidues(new PendingClob(sequence));
> 
> 
> Which sources are you using a direct data loader for?  I'm interested to
> know as I expect it to be faster only in certain circumstances.
> 
> I have a change which isn't quite ready yet that makes the
> DirectDataLoader run primary key queries in batches (using a
> ParallelBatchingFetcher). In my small test it was about 2x faster, but it
> depends what percentage of the objects need merging.
> 
> Cheers,
> Richard,
> 
> 
>> Hi Richard,
>> 
>> I’m happily using the direct data loader now. but I realize there is one
>> thing I can’t figure out tonight. I’m creating some objects with
>> sequence, but it isn’t clear how to set the residues for these things.
>> 
>> setResidues in Sequence.java expects a ClobAccess argument. But it’s not
>> clear how to create one of these from scratch.
>> 
>> Can you give me a pointer on this?
>> 
>> Thanks,
>> 
>> Joe
>> 
>> 
>> 
> 




More information about the dev mailing list