[InterMine Dev] question about an error message

Joe Carlson jwcarlson at lbl.gov
Tue Apr 21 17:05:34 BST 2015



On 04/21/2015 07:44 AM, Richard Smith wrote:
> Hi Joe,
> When an object is stored any objects it references are stored first so the
> correct ids can be inserted in foreign key columns. If the referenced
> object has already been stored then the new target id is known, it it
> hasn't then a skeleton object is stored.
>
> The skeleton fills in enough fields to store the object but the loader
> expects the full object to be stored later in the load. By storing the
> skeleton and waiting for the real object we don't have to pause the load
> to go looking for referenced objects in the source data every time one is
> seen.
>
> In the case where you created and referenced actual objects (not
> ProxyReferences) these were stored but without any integration keys you
> ended up with duplicate objects.
>
> In both cases you get the un-replaced skeleton object error as a
> referenced object or ProxyReference has been stored without storing the
> actual object. Hopefully that makes it a little clearer what is happening,
> even if it doesn't actually solve your problem.
>
> Tomorrow I hope to finish a change to make the DirectDataLoader use a
> ParallelBatchingFetcher - to group the integration queries into
> configurable batch sizes as is done in standard data loading. I think that
> will achieve a similar result to the code you've been working on.
>
> The alternative would be to add an "I'm doing something weird but let me
> get on with it" flag to allow you to store ProxyReferences to fill in
> foreign keys without getting the skeletons error.

Ah. Thanks. This is starting to make sense now. I had been thinking that 
if a ProxyReference (or an actual object) had an id the data loader 
should have known that it was already stored. I think that fits along 
the lines of "I'm doing something weird..." The downside of doing this I 
suppose is that new fields entered into the ProxyReference will not get 
integrated.
> All the best,
> Richard.
>
>
>
>
>
>
>
>> Hi Richard (and gang),
>>
>> I have a question about an error message I’m seeing. A little
>> background. As you know, I’m trying to speed up the loading. What I’m
>> trying to do is to use a DirectDataLoader to load our protein families as
>> fast as possible. I was thinking that if I could do all my queries for
>> existing gene and protein records up front, then when I do my loading, I
>> can create a ProteinFamily object with the references to the genes and
>> proteins in the production database pre-filled.
>>
>> I would have thought that removing the keys to the genes and proteins from
>> the data loader’s primary keys file would prevent any query to the
>> database during the loading. I’ve seen that when I remove all the keys
>> to genes and proteins then the integration step does not query the
>> production db during the insertions. This is what I want. And as far as I
>> can tell, there are no unnecessary queries happening.
>>
>> I’ve run the integration step and I get an ObjectStoreException:  Some
>> skeletons were not replaced by real objects: 2671330
>>
>> There are a couple of things I’m not clear on; one of theme is the
>> notion of pure objects versus skeleton objects. There is a cryptic comment
>> in IntegrationWriterDataTrackingImpl.close() about this error message
>> which I don’t quite understand.
>>
>> I’ve tried this a ways: by creating ProxyReferences and with the more
>> memory-heavy way by querying, then keeping the gene and proteins objects
>> in a hash. In both cases, I get the message.
>>
>> When I run this with the memory-heavy method, I see that I have duplicated
>> genes and proteins in the production db, even though I never call store on
>> the genes or proteins.
>>
>> So what I was wondering is 1) what does this error message mean? and 2) If
>> I query for all the objects in advance that my new data objects will point
>> to, how can I avoid having to do other queries during load time?
>>
>> Thanks. I appreciate all your help,
>>
>> Joe Carlson _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>




More information about the dev mailing list