[InterMine Dev] failed data integration for malariamine example

Pengcheng Yang pengchy at gmail.com
Wed Jul 15 07:19:03 BST 2015


Hi,

 From the intermine documentation (page 27-28), I learned that this 
problem is caused by the different gene.primaryidentifier for gff3 and 
uniprot. The default gene.primaryidentifier of gff3 is from ID and 
gene.symbol is from Name. My question is how to define the 
gene.primaryidentifier to be Name? There are some words described this 
problem, it seems to modify the "MalariaGFF3RecordHandler class" file in 
the 
bio/sources/example-sources/malaria-gff/main/src/org/intermine/bio/dataconversion/MalariaGFF3RecordHandler.java. 
Is there any detailed description of how to modify this file to define 
the gene.primaryidentifier to be Name attribute.

Best,
Pengcheng Yang


On 2015/7/15 9:22, Pengcheng Yang wrote:
> Hi,
>
> I have successfully loaded uniprot and gff3 following the tutorial in 
> intermine documentation. However, when I run the commands to check 
> data integration, the results show that the two data sets were not 
> integrated through primaryidentifier.
>
> The attached is the psql commands and the file project.xml.
>
> Best,
> Pengcheng Yang
>
> [1] the psql commands:
> malariamine=# select id, primaryidentifier, secondaryidentifier, 
> symbol, length , chromosomeid, chromosomelocationid, organismid from 
> gene where primaryIdentifier = 'PFL1385c';
>    id    | primaryidentifier | secondaryidentifier | symbol | length | 
> chromosomeid | chromosomelocationid | organismid
> ---------+-------------------+---------------------+--------+--------+--------------+----------------------+------------ 
>
>  1000581 | PFL1385c          |                     | ABRA   | 
> |              |                      |    1000026
> (1 row)
>
>
>
> malariamine=# select * from gene where primaryIdentifier = 'PFL1385c';
>  briefdescription | score | description | scoretype |   id    | symbol 
> | length | name | primaryidentifier | secondaryidentifier | ups
> treamintergenicregionid | downstreamintergenicregionid | 
> sequenceontologytermid | organismid | chromosomelocationid | 
> sequenceid | chr
> omosomeid |            class
> ------------------+-------+-------------+-----------+---------+--------+--------+------+-------------------+---------------------+---- 
>
> ------------------------+------------------------------+------------------------+------------+----------------------+------------+---- 
>
> ----------+------------------------------
>                   |       |             |           | 1000581 | ABRA   
> |        |      | PFL1385c          | |
>                         | |                1000081 |    1000026 | 
> |            |
>           | org.intermine.model.bio.Gene
> (1 row)
>
> [2] the project.xml file content:
> <project type="bio">
>   <property name="target.model" value="genomic"/>
>   <property name="source.location" location="../bio/sources/"/>
>   <property name="common.os.prefix" value="common"/>
>   <property name="intermine.properties.file" 
> value="malariamine.properties"/>
>   <property name="default.intermine.properties.file" 
> location="../default.intermine.integrate.properties"/>
>   <sources>
>                 <source name="uniprot-malaria" type="uniprot">
>                         <property name="uniprot.organisms" 
> value="36329"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/uniprot/"/>
>                 </source>
>                 <source name="go-malaria" type="go">
>                         <property name="go.organisms" value="36329"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/go/"/>
>                 </source>
>                 <source name="go-annotation-malaria" 
> type="go-annotation">
>                         <property name="go-annotation.organisms" 
> value="36329"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/go-annotation/"/>
>                 </source>
>                 <source name="malaria-chromosome-fasta" type="fasta">
>                         <property name="fasta.taxonId" value="36329"/>
>                         <property name="fasta.dataSourceName" 
> value="PlasmoDB"/>
>                         <property name="fasta.dataSetTitle" 
> value="PlasmoDB chromosome sequence"/>
>                         <property name="fasta.className" 
> value="org.intermine.model.bio.Chromosome"/>
>                         <property name="fasta.sequenceType" value="dna"/>
>                         <property name="fasta.includes" 
> value="MAL*fasta"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/genome/fasta/"/>
>                 </source>
>                 <source name="gff-malaria" type="gff">
>                         <property name="gff3.taxonId" value="36329"/>
>                         <property name="gff3.seqClsName" 
> value="Chromosome"/>
>                         <property name="gff3.dataSourceName" 
> value="PlasmoDB"/>
>                         <property name="gff3.seqDataSourceName" 
> value="PlasmoDB"/>
>                         <property name="gff3.dataSetTitle" 
> value="PlasmoDB P.falciparum genome"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/genome/gff/"/>
>                 </source>
>                 <source name="interpro-malaria" type="interpro">
>                         <property name="interpro.organisms" 
> value="36329"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/interpro/"/>
>                 </source>
>                 <source name="kegg-pathway-malaria" type="kegg-pathway">
>                         <property name="kegg-pathway.organisms" 
> value="36329"/>
>                         <property name="src.data.dir" 
> location="/home/pengchy/Soft/05.SystemBiology/malaria/kegg/"/>
>                 </source>
>
>
>   </sources>
>
>   <post-processing>
>
>
>
>   </post-processing>
>
> </project>




More information about the dev mailing list