[InterMine Dev] strange database record corruption

Richard Smith richard at flymine.org
Tue Jul 7 17:53:54 BST 2015


Hi Joe,
I haven't actually run this example yet but I think (hope) the problem is
clear.

The ids in the source data MUST be unique and internally consistent. You
can't store two different objects with the same id, which is effectively
what's happening.

The items database normally takes care of this. In your case the ids from
the 'source database' are being generated both from some objects that
already exist in the production db and sequentially generated integers, so
there's no guarantee of uniqueness.

The solution would be to replace the sequential id generation with a
generator that knows about all ids pre-fetched from the production
database and avoids assigning them again.

However, I'd still recommend trying the batching DirectDataLoader changes
first.

Let me know if I've missed the point.

Cheers,
Richard.


> Hi Richard,
>
> I hope you had a good day off on the Fourth.
>
> Oh. yeah. that's right. Never mind. Sorry about that whole unpleasantness.
>
> Anyway, I think I see what the issue is with the id collision a little
> bit better. It's basically a problem in which I'm hoping to avoid doing
> too many queries to the objectstore when I'm inserting new objects which
> reference existing data. I was thinking that if I could pre-fetch all
> the existing references ahead of time and hash them, my insertions would
> go much faster. That seems to help a lot on some loaders - I think
> because I am not doing the queries for existing objects one-at-a-time
> when I need them. Querying for all the existing objects will go faster
> if I can retrieve everything rather than doing it one-by-one. But the
> problem is that this introduces a possibility of id collisions.
>
> I've attached some of the source files for bio/souce/TroubleMaker.
> (google groups does not allow me to send .tgz files; so I renamed it.)
> You can toss this into your malariamine demo's project.xml to see it
> mess things up. This is a silly example of the phenomenon I'm seeing.
>
>    <sources>
>      <source name="uniprot-malaria" type="uniprot">
>        <property name="uniprot.organisms" value="36329"/>
>        <property name="createinterpro" value="true"/>
>        <property name="creatego" value="true"/>
>        <property name="src.data.dir"
> location="/global/u1/j/jcarlson/mal/malaria/uniprot/"/>
>      </source>
>      <source name="trouble-maker-malaria" type="TroubleMaker">
>        <property name="src.data.dir"
> value="/global/u1/j/jcarlson/mal/malaria/uniprot/" />
>        <property name="src.data.dir.includes" value="*" />
>        <property name="dataSourcename" value="Trouble" />
>        <property name="dataSetTitle" value="Trouble" />
>      </source>
>    </sources>
>
> (the value of src.data.dir for TroubleMaker is irrelevant. But since
> I've subclassed the FileDirectDataLoader, it needs to reference some
> file.)
>
> TroubleMaker is a silly thing that just causes an id collision.
> malariamine has the organism set with id=1000030. I first retrieve a
> list of organisms from the mine and hash it. Then I store a bunch of
> genes. In this implementation, I'm just making up the data and creating
> genes with a primary identifier equal to an integer:
>
>>  public void processFile(File f) {
>>
>>     retrieveOrganism();
>>     for(int i=0;i<=1000030;i++) {
>>       Gene g;
>>       try {
>>         g = getDirectDataLoader().createObject(Gene.class);
>>         g.setPrimaryIdentifier(new Integer(i).toString());
>>         g.proxyOrganism(organismProxy.get(new Integer(36329)));
>>         getDirectDataLoader().store(g);
>>       } catch (ObjectStoreException e) {
>>         throw new BuildException("Trouble storing gene:
>> "+e.getMessage());
>>       }
>>     }
>>
>>   }
>
>
> If I store 1000030 genes I can introduce a hybrid object with id=1000030:
>
>> malariamine=# select * from intermineobject where id=1000030;
>> object |   id    | class
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------------------------------------------------------------
>>  $_^org.intermine.model.bio.Gene
>> org.intermine.model.bio.Organism$_^aid$_^1000030$_^rorganism$_^1000030$_^aprimaryIdentifier$_^1000030$_^ataxonId$_^36329
>> | 1000030 | org.intermine.model.bio.Gene
>> org.intermine.model.bio.Organism
>> (1 row)
>>
>> malariamine=# select * from gene where id=1000030;
>>  secondaryidentifier | symbol | briefdescription | primaryidentifier |
>> length |   id    | name | scoretype | score | description |
>> chromosomeid | chromosomelocationid | organismid |
>> downstreamintergenicregionid | upstreamintergenicregionid | sequenceid
>> | sequenceontologytermid | class
>> ---------------------+--------+------------------+-------------------+--------+---------+------+-----------+-------+-------------+--------------+----------------------+------------+------------------------------+----------------------------+------------+------------------------+---------------------------------------------------------------
>>                      |        |                  | 1000030
>> |        | 1000030 |      |           | |             |
>> |                      |    1000030 |                              |
>> |            |                        | org.intermine.model.bio.Gene
>> org.intermine.model.bio.Organism
>> (1 row)
>>
>> malariamine=# select * from organism where id=1000030;
>>  shortname | taxonid | genus | commonname | species |   id    | name |
>> class
>> -----------+---------+-------+------------+---------+---------+------+---------------------------------------------------------------
>>            |   36329 |       |            |         | 1000030 |      |
>> org.intermine.model.bio.Gene org.intermine.model.bio.Organism
>> (1 row)
>
> The issue is really what to do in the retrieveOrganism method. I query
> the db and create proxy references in my hash:
>
>> private void retrieveOrganism() {
>>     Query q = new Query();
>>     QueryClass qC = new QueryClass(Organism.class);
>>     q.addFrom(qC);
>>     QueryField qFPid = new QueryField(qC,"taxonId");
>>     QueryField qFId = new QueryField(qC,"id");
>>     q.addToSelect(qFPid);
>>     q.addToSelect(qFId);
>>
>>     try {
>>       Results res =
>> getIntegrationWriter().getObjectStore().execute(q,1000,false,false,false);
>>       Iterator<Object> resIter = res.iterator();
>>       while (resIter.hasNext()) {
>>         @SuppressWarnings("unchecked")
>>         ResultsRow<Object> rr = (ResultsRow<Object>) resIter.next();
>>         Integer taxonId = (Integer)rr.get(0);
>>         Integer id = (Integer)rr.get(1);
>>         organismProxy.put(taxonId,new
>> ProxyReference(getIntegrationWriter().getObjectStore(),id,Organism.class));
>>       }
>>     } catch (Exception e) {
>>       throw new BuildException("Problem in prefilling Organism
>> ProxyReferences: " + e.getMessage());
>>     }
>>   }
>
>
> After I store my first gene, idMap in IntegrationWriterAbstractImpl has
> 2 elements: {0 -> 2000000, 1000030 -> 1000030}. The first is the gene I
> just stored, and the second is the organism from the hash. When I get to
> gene with the id 1000030, I do not make object 2000030, but I use the
> 'equivalent object' 1000030.
>
> So I guess the question is what is the best way to create proxy
> references to things that already exist in the objectstore? I've tried
> various things such as creating a new objects in storing them but was
> getting other errors. Some sort of batch query ahead of time would
> greatly speed things up. As you've mentioned before, getting this to
> work depends on the order of the loading steps. This is true and in my
> real loaders I think I take this into account and create objects as
> needed if I don't have a proxy reference. So things will go slower if I
> don't have a hashed reference; I can live with that.
>
> Do you have a suggestion for how I can create the proxy reference hash
> and not have a possibility of an id collision?
>
> Thanks
>
> Joe
>
>
>
> On 07/02/2015 09:41 AM, Richard Smith wrote:
>> Hi Joe,
>> Yes, there was an on-demand multiple inheritance feature. I'm not
>> convinced it actually works properly which is why you see the strange
>> object serialisation. Not all of us thought it was a good idea at the
>> time
>> and it still isn't :)
>>
>> Glad to hear you've had an improvement in loading speed on the new
>> hardware. I hope in the 1.6 release we'll pull together several small
>> performance improvements and will be able to see if that helps some
>> more.
>>
>> I'm still surprised that you see an id collision, let us know if you
>> work
>> out why.
>>
>> Cheers,
>> Richard.
>>
>>
>>
>>> Hi Richard,
>>>
>>> Thanks for the email. I did a few more experiments after sending that
>>> last
>>> email and, while I still don’t quite understand it all, at least I
>>> have
>>> something that is working and just wanted to get the loading done
>>> before
>>> poking at it again.
>>>
>>> I took out all of my idMap manipulation and still saw a problem. I had
>>> only been using it to keep a record of objects that I had retrieved
>>> from
>>> the database and kept around as proxy references. I was trying to save
>>> the
>>> time of having to do a query again when it came time to save the
>>> objects
>>> that referenced those things. So, other than taking a lot longer, the
>>> loading failed in the same manner.
>>>
>>> I had been wondering about the fact that the class for the new thing
>>> had
>>> both classes listed; I was thinking you were going to have some sort of
>>> multiple inheritance going on. Might have seemed like a good idea at
>>> the
>>> time, but I imagine it would be a pain to get the serialization to work
>>> out.
>>>
>>> The good news is that we’ve upgraded some hardware here and the
>>> slowdown
>>> that we’d seen in the past as gone away. We’re seeing a performance
>>> dip comparable to what you had: a slight dip but nothing horrible. And
>>> the
>>> entire loading process is down to 48 hours or so. Getting better.
>>> (Though
>>> I still need to do the transfer-sequence post processing step to see
>>> how
>>> long that takes; that was another bottleneck in the loading.)
>>>
>>> If I have any more experience and thoughts about the id collision
>>> thing,
>>> I’ll let you know.
>>>
>>> joe
>>>
>>>> On Jul 1, 2015, at 8:29 AM, Richard Smith <richard at flymine.org> wrote:
>>>>
>>>> Hi Joe,
>>>> Object ids in the target database are assigned when an object is
>>>> stored,
>>>> the ids are fetched (in batches) from a sequence in the database which
>>>> autoincrements. The same id will never be assigned to two different
>>>> new
>>>> objects.
>>>>
>>>> I think I see what's happening with the DirectDataLoader and the
>>>> pro-user
>>>> idMap manipulation though.
>>>>
>>>> When objects are loaded from an items database they have ids in the
>>>> source
>>>> items database, when stored they are assigned a new id in the
>>>> production
>>>> database. The idMap maps between the id the item had and the id
>>>> assigned
>>>> in the target database. This means references from the original items
>>>> are
>>>> preserved, if organism item 101 is stored and gets an object id of 201
>>>> we
>>>> know a gene item referencing organism 101 should reference organism
>>>> object
>>>> 201 in the production db.
>>>>
>>>> With the DirectDataloader there aren't any source ids because there's
>>>> no
>>>> items database. That doesn't matter, we just assign ids sequentially
>>>> as
>>>> objects are created. These aren't the ids stored in the production
>>>> database.
>>>>
>>>> I forget exactly what you're doing to manipulate the idMap but
>>>> presumably
>>>> you're pre-populating it with known ids from the production database.
>>>> At
>>>> some point these ids are colliding with the throwaway source ids
>>>> generated
>>>> in DirectDataLoader.
>>>>
>>>> As for solutions - I don't think decrementing new ids in
>>>> DirectDataloader
>>>> is a good idea as valid ids can be negative. You're right that calling
>>>> IntegrationWriter.getSerial() will throw away ids, potentially quite a
>>>> lot. A better fix might be to provide the source ids you've put in the
>>>> idMap to DirectDataLoader and tell it not to assign any of those.
>>>>
>>>> Oh, and the mangled object that was created is a 'feature' - it's
>>>> possible
>>>> to store dynamic objects that combine multiple classes in ways not
>>>> defined
>>>> in the model. We don't use this (on purpose) and will hopefully remove
>>>> it
>>>> soon.
>>>>
>>>> Hope this helps,
>>>> Richard.
>>>>
>>>>
>>>>> Hello again,
>>>>>
>>>>> Sorry if that email was confusing. I tend to write semi-coherent
>>>>> email
>>>>> late a night when I知 about to call it a night. And I was hoping to
>>>>> catch
>>>>> you folks before the weekend.
>>>>>
>>>>> The issue appears to be a collision in the id fields between the one
>>>>> generated by DirectDataLoader.createObject and an id for an object
>>>>> already
>>>>> in the database. I知 loading ‾ 1M records (100K families, 700K
>>>>> members,
>>>>> plus another 100K centroid sequences and 100K sequence alignment
>>>>> records.)
>>>>> and once I used an id for an object created by
>>>>> DirectDataLoader.createObject that collided with one in the db (the
>>>>> first
>>>>> organism record, as it turned out), then I got this weird object
>>>>> merger.
>>>>> What appears to have been an important factor was the fact that I was
>>>>> trying to minimize the querying of the database during the loading
>>>>> process
>>>>> by telling the IntegrationWriter what elements that I had retrieved
>>>>> from
>>>>> the database - including organisms - that do not need to be queried
>>>>> by
>>>>> inserting into IntegrationWriterç—´ idMap. I think Richard had
>>>>> suggested
>>>>> this; I知 not totally sure. So I made a markElementAsStored routine
>>>>> in
>>>>> IntegrationWriter to do this.
>>>>>
>>>>> But when IntegrationWriter.getEquivalentObject is called, if an id of
>>>>> an
>>>>> object created with DirectDataLoader.createObject coincides with an
>>>>> id
>>>>> in
>>>>> idMap, then the two things will be called equivalent and some sort of
>>>>> mess
>>>>> gets created.
>>>>>
>>>>> Now, I see that manipulating idMap is a dangerous thing and I値l
>>>>> stop.
>>>>> Or
>>>>> at least be more careful in how I do it. But I知 curious about this
>>>>> approach. It seems to me that there will always be a possibility that
>>>>> an
>>>>> id generated by createObject will collide with the id of something
>>>>> that
>>>>> has already been retrieved - and possibly updated - by the
>>>>> IntegrationWriter. So there is a slight chance of a collision.
>>>>>
>>>>> I知 trying a couple of work arounds: one is to decrement the
>>>>> idCounter
>>>>> from 0 in DirectDataLoader.createObject rather than incrementing it.
>>>>> So
>>>>> long as it is unique, this should be OK, right? The other is to call
>>>>> setId
>>>>> with getIntegrationWriter().getSerial(). Both appear to work. The
>>>>> first
>>>>> method may give problems if I have to worry about integer wrap
>>>>> around.
>>>>> The
>>>>> second wastes some serial numbers. Both methods seem to work at first
>>>>> blush. I知 tempted to go with the first: chances are, if I have
>>>>> integer
>>>>> wrap around I知 going to have other problems.
>>>>>
>>>>> Thanks for all your work and help on this!
>>>>>
>>>>> Joe Carlson
>>>>>
>>>>> On Jun 25, 2015, at 11:19 PM, Joe Carlson <jwcarlson at lbl.gov> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> So I知 seeing something very, very weird. Somehow I知 managing to
>>>>>> get
>>>>>> items in different tables with the same id and a corrupted record in
>>>>>> the
>>>>>> intermineobject table.
>>>>>>
>>>>>> I知 loading clusters of protein. The relevant tables are
>>>>>> proteinfamily
>>>>>> and proteinfamilymember. Each collection of proteinfamilies are
>>>>>> based
>>>>>> on
>>>>>> the set of organisms in the family building (some collection have
>>>>>> only
>>>>>> a
>>>>>> few organisms, others have many).
>>>>>>
>>>>>> In 3 out of 12 of the collections, the data loading fails with a
>>>>>> very
>>>>>> cryptic error message. Here is an example of message:
>>>>>>> /global/u1/j/jcarlson/src/intermine/bio/sources/phytozome-clusters/build.xml:23:
>>>>>>> java.lang.IllegalArgumentException: Conflicting values for field
>>>>>>> Gene,ProteinFamilyMember.organism between
>>>>>>> phytozome-chado-A.coerulea
>>>>>>> (value "Organism,ProteinFamilyMember [annotationVersion="v1.1",
>>>>>>> assemblyVersion="v1", commonName="Colorado blue columbine",
>>>>>>> count="10",
>>>>>>> genus="Aquilegia", id="1000003", membershipDetail="HMM pledge -
>>>>>>> complete", name="Aquilegia coerulea", organism=89000000,
>>>>>>> protein=91700834, proteinFamily=378112305, proteomeId="195",
>>>>>>> shortName="A. coerulea", species="coerulea", taxonId="218851",
>>>>>>> version="current"]" in database with ID 1973680) and
>>>>>>> phytozome-cluster-node-4956 (value "Organism
>>>>>>> [annotationVersion="v1.1",
>>>>>>> assemblyVersion="v1.0", commonName="switchgrass", genus="Panicum",
>>>>>>> id=142000000, name="Panicum virgatum", proteomeId=273,
>>>>>>> shortName="P.
>>>>>>> virgatum", species="virgatum", taxonId=38727, version="current"]"
>>>>>>> being
>>>>>>> stored). This field needs configuring in the
>>>>>>> genomic_priorities.properties file
>>>>>>>         at
>>>>>>> org.intermine.dataloader.SourcePriorityComparator.compare(SourcePriorityComparator.java:276)
>>>>>>>         at
>>>>>>> org.intermine.dataloader.SourcePriorityComparator.compare(SourcePriorityComparator.java:34)
>>>>>>>         at java.util.TreeMap.put(TreeMap.java:545)
>>>>>>>         at java.util.TreeSet.add(TreeSet.java:255)
>>>>>>>         at
>>>>>>> org.intermine.dataloader.IntegrationWriterDataTrackingImpl.store(IntegrationWriterDataTrackingImpl.java:385)
>>>>>>
>>>>>> It is strange since the data that I知 loading has no previous
>>>>>> objects
>>>>>> to
>>>>>> compare. This was confusing me for a long time, for a long time I
>>>>>> thought I just had problems with my keys. Then I saw that there was
>>>>>> a
>>>>>> corruption in the intermineobject table:
>>>>>>
>>>>>>> select * from intermineobject where id=1000003;
>>>>>>>                                                                                                                                                                                                                                              object
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> |
>>>>>>>
>>>>>>>
>>>>>>> id
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> |
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> class
>>>>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------------------------------------------------------------------------------
>>>>>>> $_^org.intermine.model.bio.Organism
>>>>>>> org.intermine.model.bio.ProteinFamilyMember$_^aannotationVersion$_^v1.1$_^aassemblyVersion$_^v1$_^acommonName$_^Colorado
>>>>>>> blue
>>>>>>> columbine$_^acount$_^10$_^agenus$_^Aquilegia$_^aid$_^1000003$_^amembershipDetail$_^HMM
>>>>>>> pledge - complete$_^aname$_^Aquilegia
>>>>>>> coerulea$_^rorganism$_^89000000$_^rprotein$_^91700834$_^rproteinFamily$_^378112305$_^aproteomeId$_^195$_^ashortName$_^A.
>>>>>>> coerulea$_^aspecies$_^coerulea$_^ataxonId$_^218851$_^aversion$_^current
>>>>>>> | 1000003 | org.intermine.model.bio.Organism
>>>>>>> org.intermine.model.bio.ProteinFamilyMember
>>>>>>> (1 row)
>>>>>>
>>>>>> This is some sort of unholy union of an organism and a
>>>>>> proteinfamilymember. There is an entry in both the organism table
>>>>>> and
>>>>>> proteinfamilymember table with this id. The fields of these two
>>>>>> records
>>>>>> are OK, other than the fact that the class field is the
>>>>>> concatenation
>>>>>> of
>>>>>> the 2 class names.
>>>>>>
>>>>>> The behavior is reproducible; after inserting ‾ 100K families and
>>>>>> ‾ 700K
>>>>>> members, the loading fails on the same exact record if I load in the
>>>>>> same order. If I change the loading, there is a similar error on a
>>>>>> different entry. 1000003 is my first 創on-trivial� intermine
>>>>>> object
>>>>>> (the
>>>>>> others being sequence ontology, a data source and a data set
>>>>>> record.)
>>>>>>
>>>>>> Have you seen this type of behavior before? I just found out about
>>>>>> this
>>>>>> record corruption tonight. The fact that it is so reproducible makes
>>>>>> me
>>>>>> think there is some sort of counter rollover that I知 running into.
>>>>>>
>>>>>> In the interests of full disclosure, I should say I知 using a
>>>>>> direct
>>>>>> data loader. The code is in my github repo in
>>>>>> bio/sources/phytozome-clusters/.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Joe
>>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at intermine.org
>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>
>>>
>>>
>
>




More information about the dev mailing list