[InterMine Dev] question about an error message

Joe Carlson jwcarlson at lbl.gov
Mon Apr 20 06:27:50 BST 2015


sorry that this is a duplicate: I originally sent it from my non-registered email address and it got flagged as needing moderation.



Hi Richard (and gang),

I have a question about an error message I’m seeing. A little background. As you know, I’m trying to speed up the loading. What I’m trying to do is to use a DirectDataLoader to load our protein families as fast as possible. I was thinking that if I could do all my queries for existing gene and protein records up front, then when I do my loading, I can create a ProteinFamily object with the references to the genes and proteins in the production database pre-filled.

I would have thought that removing the keys to the genes and proteins from the data loader’s primary keys file would prevent any query to the database during the loading. I’ve seen that when I remove all the keys to genes and proteins then the integration step does not query the production db during the insertions. This is what I want. And as far as I can tell, there are no unnecessary queries happening. 

I’ve run the integration step and I get an ObjectStoreException:  Some skeletons were not replaced by real objects: 2671330

There are a couple of things I’m not clear on; one of theme is the notion of pure objects versus skeleton objects. There is a cryptic comment in IntegrationWriterDataTrackingImpl.close() about this error message which I don’t quite understand.

I’ve tried this a ways: by creating ProxyReferences and with the more memory-heavy way by querying, then keeping the gene and proteins objects in a hash. In both cases, I get the message.

When I run this with the memory-heavy method, I see that I have duplicated genes and proteins in the production db, even though I never call store on the genes or proteins.

So what I was wondering is 1) what does this error message mean? and 2) If I query for all the objects in advance that my new data objects will point to, how can I avoid having to do other queries during load time?

Thanks. I appreciate all your help,

Joe Carlson 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20150419/c35ffbc1/attachment.html>


More information about the dev mailing list