[InterMine Dev] how to define LocatedOn attribute of Location and avoid duplicate objects error

Dr. Intikhab Alam intikhab.alam at kaust.edu.sa
Mon Mar 5 11:40:47 GMT 2012


Hi,

If I want to use my own data to describe Location's locatedOn attribute
for Dmel genome, what is the best way to define the locatedOn attribute
and avoid duplicate objects e.g. The Organism object.

I am trying to build flymine with my own data and followed the
instructions at
http://intermine.org/wiki/FlyMineOwnData (by using pg_restore). What is
the best way to define a locatedOn feature of my cageTags data. I can read
all
the flymine data and my cageTags data but at the integration stage I get
the
duplicated items error:

BUILD FAILED
/home/intikhab/biosoft/intermine_0_99/imbuild/integrate.xml:54: The
following error occurred while executing this line:
/home/intikhab/biosoft/intermine_0_99/imbuild/source.xml:330:
java.lang.RuntimeException: Exception while dataloading - to allow
multiple errors, set the property "dataLoader.allowMultipleErrors" to true
Problem while loading item identifier 0_1 because
There are duplicate objects in the source being loaded, multiple items are
identical according to the primary key being used. Storing again to id
1738000001 object from source Organism [commonName="null", genus="null",
id="1", name="Drosophila melanogaster", shortName="D. melanogaster",
species="null", taxonId="7227"]



When I load my data from large-xml type source, I declare, to have a
locatedOn attribute of my cageTags:


my ( $taxonid, $longname, $shortname ) = ("7227", "Drosophila
melanogaster","D. melanogaster"); #get_organism_detail($taxonfile);

my $organism = $doc->add_item(
      "Organism",
      "taxonId"   =>  $taxonid,
      "name"      =>  $longname,
      "shortName" =>  $shortname
);


     $chromosome = $doc->add_item(

          'Chromosome',
          'primaryIdentifier' =>  $chr,
          'dataSets'          =>  [$data_set_item],
          #'sequence'          =>  $CHRseq,
          #'length'            =>  $chromlen,
          'organism'          =>  $organism,
        );



          'Location',
                  'start' =>$st,
                  'end'   =>$end,
                  'strand'        =>$strand,
                  'feature'       =>$cagecluster,
                  'locatedOn',    =>  $chromosome,

          );



Obviously, The 'Organism' object would be in the database already, before
my addition but how could I use the locatedOn feature that goes to the
right dataSet?


Similar issue with the KEGG data I loaded in my other project, redseamine,
I can not see the Pathways widget displayed apart from on section in my
Gene report page where it lists the pathways involved but displays the
source
Of the data as my project name.

There I declare the source as:

my $keggdata_source_item =
    $doc->add_item( DataSource =>  ( name =>  'KEGG', ), );

my $keggdata_set_item = $doc->add_item(
      DataSet =>  (
          name =>  "KEGG",
          description =>
            "KEGG",
          'dataSource' =>  $keggdata_source_item,

        ),
    );



                  my $pathway = $doc->add_item(
                  'Pathway',
                          'identifier' =>$kmapid,
                          'name' =>$kmapdesc,
                          'genes' =>[$gene],
                          'dataSets' =[$keggdata_set_item];

                  );



Any help on properly defining the dataSets to avoid duplicate entries?

Regards,

Intikhab






More information about the dev mailing list