[InterMine Dev] error whil loading genes items xml file

Alex Kalderimis alex at intermine.org
Mon Aug 12 17:14:13 BST 2013


On Mon, Aug 12, 2013 at 01:52:44PM +0000, Jayaraman, Pushkala wrote:
> Could it be possible that this error gets thrown not for genes with similar RGDIds but similar Ensembl Ids that are modeled as synonyms for these genes?
> 

It depends on how you have configured your primary keys. And keys must
be identical, not similar.

Alex

> 
> 
> -----Original Message-----
> From: A.J. Kalderimis [mailto:ajk59 at hermes.cam.ac.uk] On Behalf Of Alex Kalderimis
> Sent: Monday, August 12, 2013 7:53 AM
> To: Jayaraman, Pushkala
> Cc: dev at intermine.org
> Subject: Re: [InterMine Dev] error whil loading genes items xml file
> 
> When you write a source, you need to ensure that each set of objects is just that, a set, ie. that the following relationship holds:
> 
>   for all objects, there exists no object which is "equivalent"
>   to any other.
> 
> Where "equivalent" has the following meaning in this context:
> 
>   Objects are equivalent when the have the same values for fields
>   which are used to compute the primary key.
> 
> So if "taxonId" is used as the primary key for organism, there should only be one object in the set of objects to load with the taxonId '9696'.
> 
> The error you received suggests that two genes in your set are equivalent (at a guess I would suggest that they share RDG ids, aka.
> primaryIdentifier). 
> 
> Avoiding this is the task of the programmer; the easiest way to do it is to store a mapping from primary-key to object for the objects you expect to encounter multiple times (such as organisms, since each gene will have an organism), and only write the objects to a file or db when you have finished processing them all. Obviously this can cause issues with memory management, but sadly that must be mitigated on a case by case basis.
> 
> I hope this helps,
> 
> Alex.
> 
>     
> On Mon, Aug 12, 2013 at 12:44:56PM +0000, Jayaraman, Pushkala wrote:
> > Hello,
> > This error is fairly new I don't know a I'm doing wrong here and I need help figuring this out:
> > 
> > Caused by: java.lang.IllegalArgumentException: There are duplicate objects in the source being loaded, multiple items are identical according to the primary key being used. Storing again to id 1019692 object from source Gene [briefDescription="null", chromosome=null, chromosomeLocation=null, description="null", downstreamIntergenicRegion=null, ensemblIdentifier="null", fishBand="null", geneType="pseudo", id="24621", length="null", name="crooked neck pre-mRNA splicing factor-like 1 (Drosophila) pseudogene", ncbiGeneNumber="116481", nomenclatureStatus="APPROVED", organism=1, pharmGKBidentifier="null", primaryIdentifier="RGD:7242712", score="null", scoreType="null", secondaryIdentifier="RGD:7242712", sequence=null, sequenceOntologyTerm=3, soTerm=null, symbol="Crnkl1-ps1", upstreamIntergenicRegion=null]
> >         at org.intermine.dataloader.IntegrationWriterDataTrackingImpl.store(IntegrationWriterDataTrackingImpl.java:297)
> >         at org.intermine.dataloader.IntegrationWriterAbstractImpl.store(IntegrationWriterAbstractImpl.java:171)
> >         at org.intermine.dataloader.XmlDataLoader.processXml(XmlDataLoader.java:80)
> >         at org.intermine.dataloader.XmlDataLoaderTask.execute(XmlDataLoaderTask.java:160)
> >         ... 31 more
> > 
> > Total time: 22 seconds
> > Mon Aug 12 07:22:34 CDT 2013
> > 
> > Here is the part of the xml file in question..
> > 
> > <item id="0_24621" class="" implements="Gene">
> >       <attribute name="symbol" value="Crnkl1-ps1" />
> >       <reference name="sequenceOntologyTerm" ref_id="0_3" />
> >       <attribute name="primaryIdentifier" value="RGD:7242712" />
> >       <attribute name="name" value="crooked neck pre-mRNA splicing factor-like 1 (Drosophila) pseudogene" />
> >       <collection name="dataSets">
> >          <reference ref_id="0_2" />
> >       </collection>
> >       <attribute name="secondaryIdentifier" value="RGD:7242712" />
> >       <attribute name="geneType" value="pseudo" />
> >       <attribute name="ncbiGeneNumber" value="116481" />
> >       <reference name="organism" ref_id="0_1" />
> >       <attribute name="nomenclatureStatus" value="APPROVED" />
> >    </item>
> >    <item id="0_24622" class="" implements="Synonym">
> >       <attribute name="value" value="116481" />
> >       <reference name="subject" ref_id="0_24621" />
> >    </item>
> >    <item id="0_24623" class="" implements="Publication">
> >       <attribute name="pubMedId" value="11804325" />
> >    </item>
> >    <item id="0_24624" class="" implements="Publication">
> >       <attribute name="pubMedId" value="10217146" />
> >    </item>
> >    <item id="0_24625" class="" implements="Protein">
> >       <attribute name="primaryAccession" value="B2GUU9" />
> >       <reference name="organism" ref_id="0_1" />
> >    </item>
> >    <item id="0_24626" class="" implements="Protein">
> >       <attribute name="primaryAccession" value="Q923I8" />
> >       <reference name="organism" ref_id="0_1" />
> >    </item>
> > 
> > 
> > 
> > Pushkala Jayaraman
> > Programmer/Analyst - Rat Genome Database Human and Molecular Genetics 
> > Center Medical College of Wisconsin
> > 414-955-2229
> > http://rgd.mcw.edu
> > 
> 
> > _______________________________________________
> > dev mailing list
> > dev at intermine.org
> > http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
> 



More information about the dev mailing list