[InterMine Dev] failed data integration for malariamine example

Pengcheng Yang pengchy at gmail.com
Wed Jul 15 08:45:24 BST 2015


Hi,

Following the documentation, I have modified the project.properties file 
in the malariamine/dbmodle directory [1], in which I have added the 
project.properties file path and the related information. But when I 
reload the gff3 data, the primaryidentifier still be the Name attributes 
in the gff3 file. I have not modify the file 
org.intermine.bio.dataconversion.MalariaGFF3RecordHandler. I wonder how 
to define the primaryidentifier?

Best,
Pengcheng Yang


[1] malariamine/dbmodel/project.properties
compile.dependencies = intermine/integrate/main, \
                 bio/core/main, \
                 bio/sources/example-sources/malaria-gff/main

objectstore.name = os.production
model.name = genomic

core.model.path = bio/core
extra.model.paths.start = 
malariamine/dbmodel/build/model/so_additions.xml 
bio/core/genomic_additions.xml

so.termlist.file = resources/so_terms
so.obo.file = ../../bio/sources/so/so.obo
so.output.file = build/model/so_additions.xml

# choose the intermine.properties file from $HOME:
intermine.properties.file = malariamine.properties
default.intermine.properties.file = 
../default.intermine.integrate.properties

# Set the source type to be gff3
have.file.gff3 = true

# specify a Java class to be called on each row of the gff file to cope 
with attributes
gff3.handlerClassName = 
org.intermine.bio.dataconversion.MalariaGFF3RecordHandler

On 2015/7/15 14:19, Pengcheng Yang wrote:
> Hi,
>
> From the intermine documentation (page 27-28), I learned that this 
> problem is caused by the different gene.primaryidentifier for gff3 and 
> uniprot. The default gene.primaryidentifier of gff3 is from ID and 
> gene.symbol is from Name. My question is how to define the 
> gene.primaryidentifier to be Name? There are some words described this 
> problem, it seems to modify the "MalariaGFF3RecordHandler class" file 
> in the 
> bio/sources/example-sources/malaria-gff/main/src/org/intermine/bio/dataconversion/MalariaGFF3RecordHandler.java. 
> Is there any detailed description of how to modify this file to define 
> the gene.primaryidentifier to be Name attribute.
>
> Best,
> Pengcheng Yang
>
>
> On 2015/7/15 9:22, Pengcheng Yang wrote:
>> Hi,
>>
>> I have successfully loaded uniprot and gff3 following the tutorial in 
>> intermine documentation. However, when I run the commands to check 
>> data integration, the results show that the two data sets were not 
>> integrated through primaryidentifier.
>>
>> The attached is the psql commands and the file project.xml.
>>
>> Best,
>> Pengcheng Yang
>>
>> [1] the psql commands:
>> malariamine=# select id, primaryidentifier, secondaryidentifier, 
>> symbol, length , chromosomeid, chromosomelocationid, organismid from 
>> gene where primaryIdentifier = 'PFL1385c';
>>    id    | primaryidentifier | secondaryidentifier | symbol | length 
>> | chromosomeid | chromosomelocationid | organismid
>> ---------+-------------------+---------------------+--------+--------+--------------+----------------------+------------ 
>>
>>  1000581 | PFL1385c          |                     | ABRA   | 
>> |              |                      |    1000026
>> (1 row)
>>
>>
>>
>> malariamine=# select * from gene where primaryIdentifier = 'PFL1385c';
>>  briefdescription | score | description | scoretype |   id    | 
>> symbol | length | name | primaryidentifier | secondaryidentifier | ups
>> treamintergenicregionid | downstreamintergenicregionid | 
>> sequenceontologytermid | organismid | chromosomelocationid | 
>> sequenceid | chr
>> omosomeid |            class
>> ------------------+-------+-------------+-----------+---------+--------+--------+------+-------------------+---------------------+---- 
>>
>> ------------------------+------------------------------+------------------------+------------+----------------------+------------+---- 
>>
>> ----------+------------------------------
>>                   |       |             |           | 1000581 | 
>> ABRA   |        |      | PFL1385c          | |
>>                         | |                1000081 |    1000026 | 
>> |            |
>>           | org.intermine.model.bio.Gene
>> (1 row)
>>
>> [2] the project.xml file content:
>> <project type="bio">
>>   <property name="target.model" value="genomic"/>
>>   <property name="source.location" location="../bio/sources/"/>
>>   <property name="common.os.prefix" value="common"/>
>>   <property name="intermine.properties.file" 
>> value="malariamine.properties"/>
>>   <property name="default.intermine.properties.file" 
>> location="../default.intermine.integrate.properties"/>
>>   <sources>
>>                 <source name="uniprot-malaria" type="uniprot">
>>                         <property name="uniprot.organisms" 
>> value="36329"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/uniprot/"/>
>>                 </source>
>>                 <source name="go-malaria" type="go">
>>                         <property name="go.organisms" value="36329"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/go/"/>
>>                 </source>
>>                 <source name="go-annotation-malaria" 
>> type="go-annotation">
>>                         <property name="go-annotation.organisms" 
>> value="36329"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/go-annotation/"/>
>>                 </source>
>>                 <source name="malaria-chromosome-fasta" type="fasta">
>>                         <property name="fasta.taxonId" value="36329"/>
>>                         <property name="fasta.dataSourceName" 
>> value="PlasmoDB"/>
>>                         <property name="fasta.dataSetTitle" 
>> value="PlasmoDB chromosome sequence"/>
>>                         <property name="fasta.className" 
>> value="org.intermine.model.bio.Chromosome"/>
>>                         <property name="fasta.sequenceType" 
>> value="dna"/>
>>                         <property name="fasta.includes" 
>> value="MAL*fasta"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/genome/fasta/"/>
>>                 </source>
>>                 <source name="gff-malaria" type="gff">
>>                         <property name="gff3.taxonId" value="36329"/>
>>                         <property name="gff3.seqClsName" 
>> value="Chromosome"/>
>>                         <property name="gff3.dataSourceName" 
>> value="PlasmoDB"/>
>>                         <property name="gff3.seqDataSourceName" 
>> value="PlasmoDB"/>
>>                         <property name="gff3.dataSetTitle" 
>> value="PlasmoDB P.falciparum genome"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/genome/gff/"/>
>>                 </source>
>>                 <source name="interpro-malaria" type="interpro">
>>                         <property name="interpro.organisms" 
>> value="36329"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/interpro/"/>
>>                 </source>
>>                 <source name="kegg-pathway-malaria" type="kegg-pathway">
>>                         <property name="kegg-pathway.organisms" 
>> value="36329"/>
>>                         <property name="src.data.dir" 
>> location="/home/pengchy/Soft/05.SystemBiology/malaria/kegg/"/>
>>                 </source>
>>
>>
>>   </sources>
>>
>>   <post-processing>
>>
>>
>>
>>   </post-processing>
>>
>> </project>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20150715/c5e11c4b/attachment-0001.html>


More information about the dev mailing list