[InterMine Dev] question about an error message

Joe Carlson jwcarlson at lbl.gov
Wed Apr 22 06:50:35 BST 2015


Hi Richard,

This is starting to make more sense. I’m thinking that the error message is really what I want to get. Not creating real objects from proxy references is what I’d like to do.

So I was thinking of subclassing ProxyReference to make a non-storable ProxyReference. Something that the store method would just ignore. Do you think there is a problem with this approach?

Joe

> On Apr 21, 2015, at 7:44 AM, Richard Smith <richard at flymine.org> wrote:
> 
> Hi Joe,
> When an object is stored any objects it references are stored first so the
> correct ids can be inserted in foreign key columns. If the referenced
> object has already been stored then the new target id is known, it it
> hasn't then a skeleton object is stored.
> 
> The skeleton fills in enough fields to store the object but the loader
> expects the full object to be stored later in the load. By storing the
> skeleton and waiting for the real object we don't have to pause the load
> to go looking for referenced objects in the source data every time one is
> seen.
> 
> In the case where you created and referenced actual objects (not
> ProxyReferences) these were stored but without any integration keys you
> ended up with duplicate objects.
> 
> In both cases you get the un-replaced skeleton object error as a
> referenced object or ProxyReference has been stored without storing the
> actual object. Hopefully that makes it a little clearer what is happening,
> even if it doesn't actually solve your problem.
> 
> Tomorrow I hope to finish a change to make the DirectDataLoader use a
> ParallelBatchingFetcher - to group the integration queries into
> configurable batch sizes as is done in standard data loading. I think that
> will achieve a similar result to the code you've been working on.
> 
> The alternative would be to add an "I'm doing something weird but let me
> get on with it" flag to allow you to store ProxyReferences to fill in
> foreign keys without getting the skeletons error.
> 
> All the best,
> Richard.
> 
> 
> 
> 
> 
> 
> 
>> Hi Richard (and gang),
>> 
>> I have a question about an error message I’m seeing. A little
>> background. As you know, I’m trying to speed up the loading. What I’m
>> trying to do is to use a DirectDataLoader to load our protein families as
>> fast as possible. I was thinking that if I could do all my queries for
>> existing gene and protein records up front, then when I do my loading, I
>> can create a ProteinFamily object with the references to the genes and
>> proteins in the production database pre-filled.
>> 
>> I would have thought that removing the keys to the genes and proteins from
>> the data loader’s primary keys file would prevent any query to the
>> database during the loading. I’ve seen that when I remove all the keys
>> to genes and proteins then the integration step does not query the
>> production db during the insertions. This is what I want. And as far as I
>> can tell, there are no unnecessary queries happening.
>> 
>> I’ve run the integration step and I get an ObjectStoreException:  Some
>> skeletons were not replaced by real objects: 2671330
>> 
>> There are a couple of things I’m not clear on; one of theme is the
>> notion of pure objects versus skeleton objects. There is a cryptic comment
>> in IntegrationWriterDataTrackingImpl.close() about this error message
>> which I don’t quite understand.
>> 
>> I’ve tried this a ways: by creating ProxyReferences and with the more
>> memory-heavy way by querying, then keeping the gene and proteins objects
>> in a hash. In both cases, I get the message.
>> 
>> When I run this with the memory-heavy method, I see that I have duplicated
>> genes and proteins in the production db, even though I never call store on
>> the genes or proteins.
>> 
>> So what I was wondering is 1) what does this error message mean? and 2) If
>> I query for all the objects in advance that my new data objects will point
>> to, how can I avoid having to do other queries during load time?
>> 
>> Thanks. I appreciate all your help,
>> 
>> Joe Carlson _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>> 
> 




More information about the dev mailing list