[InterMine Dev] error whil loading genes items xml file

Alex Kalderimis alex at intermine.org
Mon Aug 12 13:53:27 BST 2013


When you write a source, you need to ensure that each set of objects
is just that, a set, ie. that the following relationship holds:

  for all objects, there exists no object which is "equivalent"
  to any other.

Where "equivalent" has the following meaning in this context:

  Objects are equivalent when the have the same values for fields
  which are used to compute the primary key.

So if "taxonId" is used as the primary key for organism, there should
only be one object in the set of objects to load with the taxonId
'9696'.

The error you received suggests that two genes in your set are
equivalent (at a guess I would suggest that they share RDG ids, aka.
primaryIdentifier). 

Avoiding this is the task of the programmer; the easiest way to do it
is to store a mapping from primary-key to object for the objects you
expect to encounter multiple times (such as organisms, since each gene
will have an organism), and only write the objects to a file or db
when you have finished processing them all. Obviously this can cause
issues with memory management, but sadly that must be mitigated on a
case by case basis.

I hope this helps,

Alex.

    
On Mon, Aug 12, 2013 at 12:44:56PM +0000, Jayaraman, Pushkala wrote:
> Hello,
> This error is fairly new I don't know a I'm doing wrong here and I need help figuring this out:
> 
> Caused by: java.lang.IllegalArgumentException: There are duplicate objects in the source being loaded, multiple items are identical according to the primary key being used. Storing again to id 1019692 object from source Gene [briefDescription="null", chromosome=null, chromosomeLocation=null, description="null", downstreamIntergenicRegion=null, ensemblIdentifier="null", fishBand="null", geneType="pseudo", id="24621", length="null", name="crooked neck pre-mRNA splicing factor-like 1 (Drosophila) pseudogene", ncbiGeneNumber="116481", nomenclatureStatus="APPROVED", organism=1, pharmGKBidentifier="null", primaryIdentifier="RGD:7242712", score="null", scoreType="null", secondaryIdentifier="RGD:7242712", sequence=null, sequenceOntologyTerm=3, soTerm=null, symbol="Crnkl1-ps1", upstreamIntergenicRegion=null]
>         at org.intermine.dataloader.IntegrationWriterDataTrackingImpl.store(IntegrationWriterDataTrackingImpl.java:297)
>         at org.intermine.dataloader.IntegrationWriterAbstractImpl.store(IntegrationWriterAbstractImpl.java:171)
>         at org.intermine.dataloader.XmlDataLoader.processXml(XmlDataLoader.java:80)
>         at org.intermine.dataloader.XmlDataLoaderTask.execute(XmlDataLoaderTask.java:160)
>         ... 31 more
> 
> Total time: 22 seconds
> Mon Aug 12 07:22:34 CDT 2013
> 
> Here is the part of the xml file in question..
> 
> <item id="0_24621" class="" implements="Gene">
>       <attribute name="symbol" value="Crnkl1-ps1" />
>       <reference name="sequenceOntologyTerm" ref_id="0_3" />
>       <attribute name="primaryIdentifier" value="RGD:7242712" />
>       <attribute name="name" value="crooked neck pre-mRNA splicing factor-like 1 (Drosophila) pseudogene" />
>       <collection name="dataSets">
>          <reference ref_id="0_2" />
>       </collection>
>       <attribute name="secondaryIdentifier" value="RGD:7242712" />
>       <attribute name="geneType" value="pseudo" />
>       <attribute name="ncbiGeneNumber" value="116481" />
>       <reference name="organism" ref_id="0_1" />
>       <attribute name="nomenclatureStatus" value="APPROVED" />
>    </item>
>    <item id="0_24622" class="" implements="Synonym">
>       <attribute name="value" value="116481" />
>       <reference name="subject" ref_id="0_24621" />
>    </item>
>    <item id="0_24623" class="" implements="Publication">
>       <attribute name="pubMedId" value="11804325" />
>    </item>
>    <item id="0_24624" class="" implements="Publication">
>       <attribute name="pubMedId" value="10217146" />
>    </item>
>    <item id="0_24625" class="" implements="Protein">
>       <attribute name="primaryAccession" value="B2GUU9" />
>       <reference name="organism" ref_id="0_1" />
>    </item>
>    <item id="0_24626" class="" implements="Protein">
>       <attribute name="primaryAccession" value="Q923I8" />
>       <reference name="organism" ref_id="0_1" />
>    </item>
> 
> 
> 
> Pushkala Jayaraman
> Programmer/Analyst - Rat Genome Database
> Human and Molecular Genetics Center
> Medical College of Wisconsin
> 414-955-2229
> http://rgd.mcw.edu
> 

> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev




More information about the dev mailing list