[InterMine Dev] kegg-pathway load: genespathways table is empty

Pengcheng Yang yangpc at biols.ac.cn
Mon Aug 3 10:22:52 BST 2015


The content of file 
bio/sources/kegg-pathway/main/resources/kegg_config.properties:

# configuration file that determines which fields are set in the kegg 
converted
#
# <ORGANISM ABBR>.taxonId
# <ORGANISM ABBR>.identifier = which gene field to set
# if identifier is not set, primaryIdentifier will be used
#
# NOTE: this is only a configuration file.  to actually load organisms, 
add them to the KEGG entry in project.xml
# See http://www.genome.jp/kegg/catalog/org_list.html for list of 
organism abbreviations


# melanogaster
dme.taxonId = 7227
dme.identifier = primaryIdentifier

# human
hsa.taxonId = 9606
hsa.identifier = symbol

# mouse
mmu.taxonId = 10090
mmu.identifier = primaryIdentifier

# rat
rno.taxonId = 10116

# yeast
sce.taxonId = 4932

# zebrafish
dre.taxonId = 7955

# worm
cel.taxonId = 6239
cel.taxonId = 6239

# malaria
pfa.taxonId = 36329

#my org
org.taxonid = 1111


On 2015/8/3 17:15, Pengcheng Yang wrote:
> Sure,
>
> Content of the file 
> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties:
>
> Organism.key_taxonid=taxonId
> DataSource.key_name=name
> Gene.key_primaryidentifier=primaryIdentifier
> Gene.key_symbol_org=symbol, organism
> Gene.key_secondaryidentifier=secondaryIdentifier, organism
> DataSet.key_title=name
> SOTerm.key=name, ontology
> Ontology.key_title=name
> pfa.taxonId = 36329
> org.taxonId = 1111
>
> Best,
> Pengcheng Yang
>
> On 2015/8/3 17:10, Julie Sullivan wrote:
>> Can you send me the configuration you added to the file?
>>
>> This is the list of KEGG organisms and associated abbreviations:
>>
>>     http://www.genome.jp/kegg/catalog/org_list.html
>>
>>
>> On 03/08/15 10:06, Pengcheng Yang wrote:
>>>
>>> Hi Chen Yian and Julie Sullivan,
>>>
>>> Thank you for your reply and the information.
>>>
>>> I have tried both the following two methods, the talbe "genespathways"
>>> remains empty.
>>> 1) Adding the org.taxonId=1111 to the
>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties file.
>>> Here the taxonId and organism name were coined for confidential reason.
>>> 2) remove the "kegg.organisms" property from project.xml file.
>>>
>>> I have checked the related information maybe useful:
>>> 1) "select  * from pathway" return expected information.
>>> 2) I compared the kegg-pathway.log1 file from mymine and malariamine 
>>> and
>>> found the following that specific to mymine, not existed in
>>> malariamine's kegg-pathway.log1
>>>
>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java added as
>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>> outdated.
>>> 3172,3173d3169
>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java added as
>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>> 3217c3213
>>> <     [javac]
>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegion.java 
>>>
>>>
>>> <     [javac]
>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegionShadow.java 
>>>
>>>
>>> 3352,3353d3345
>>> <     [javac]
>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegion.java 
>>>
>>>
>>> <     [javac]
>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegionShadow.java 
>>>
>>>
>>> 3540,3541d3531
>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegion.class 
>>> added as
>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegionShadow.class
>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>> outdated.
>>> 3543,3544d3532
>>> <   [lib:jar] org/intermine/model/bio/ProteinRegion.class added as
>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>> <   [lib:jar] org/intermine/model/bio/ProteinRegionShadow.class 
>>> added as
>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>> <   [lib:jar] adding entry
>>> org/intermine/model/bio/ProteinDomainRegion.class
>>> <   [lib:jar] adding entry
>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>> 3720,3721d3705
>>> <   [lib:jar] adding entry org/intermine/model/bio/ProteinRegion.class
>>> <   [lib:jar] adding entry
>>> org/intermine/model/bio/ProteinRegionShadow.class
>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java added as
>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>> outdated.
>>> 7230,7231d7211
>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java added as
>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>> 7275c7255
>>>
>>> Thanks a lot!
>>>
>>> Best,
>>> Pengcheng Yang
>>>
>>>
>>>
>>> On 2015/8/3 16:14, Julie Sullivan wrote:
>>>> Here are the docs on the kegg source:
>>>>
>>>> http://intermine.readthedocs.org/en/latest/database/data-sources/library/pathways/kegg/ 
>>>>
>>>>
>>>>
>>>> KEGG uses its own prefix, which InterMine does not know. You have to
>>>> configure this in the config file.
>>>>
>>>> e.g. KEGG uses "dme" for Drosophila melanogaster and the data file is
>>>> named "dme_gene_map.tab".
>>>>
>>>> The reason why malaria worked is that is already configured:
>>>>
>>>> https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties#L37 
>>>>
>>>>
>>>>
>>>> You have two options:
>>>>
>>>> 1. remove the taxon ID from your project XML file, all genes will be
>>>> loaded
>>>>
>>>> 2. configure the taxon ID in the kegg_config.properties
>>>>
>>>>
>>>>
>>>> On 03/08/15 08:55, Pengcheng Yang wrote:
>>>>> Hi Julie Sullivan,
>>>>>
>>>>> Thank you for your reply.
>>>>>
>>>>> I listed the kegg-pathway part of the project.xml file for the two 
>>>>> mine.
>>>>> It seems they have no difference except the path and organisms.
>>>>>
>>>>> [1] The project.xml of my mine:
>>>>> ----------------------
>>>>> <source name="kegg-pathway" type="kegg-pathway">
>>>>>         <property name="kegg.organisms" value="1111"/>
>>>>>        <property name="src.data.dir" 
>>>>> location="/path/to/mymine/kegg/"/>
>>>>>      </source>
>>>>>
>>>>> [2] The project.xml of malariamine
>>>>>      <source name="kegg-pathway" type="kegg-pathway">
>>>>>        <property name="kegg.organisms" value="36329"/>
>>>>>        <property name="src.data.dir" 
>>>>> location="/path/to/malaria/kegg/"/>
>>>>>      </source>
>>>>>
>>>>> I have checked the file org_gene_map.tab file, its format indeed is:
>>>>> GeneID<tb>mapid<space>mapid<space>mapid
>>>>>
>>>>> Best,
>>>>> Pengcheng Yang
>>>>>
>>>>> On 2015/8/3 15:32, Julie Sullivan wrote:
>>>>>> Sorry you are having problems with the kegg source!
>>>>>>
>>>>>> Can you clarify what is different about the two project XML files?
>>>>>>
>>>>>> On 02/08/15 10:01, Pengcheng Yang wrote:
>>>>>>> Hi InterMiner developers,
>>>>>>>
>>>>>>> Thank you all who answered my questions. Here is another 
>>>>>>> question that
>>>>>>> blocked my way to deploy my InterMine.
>>>>>>>
>>>>>>> To load kegg-pathway data, I set the project.xml as that in
>>>>>>> malariamine
>>>>>>> and prepared the two files map_title.tab and org_gene_map.tab. 
>>>>>>> When I
>>>>>>> load the data using "ant -Dsource=kegg-pathway -v 1> 
>>>>>>> kegg-pathway.log1
>>>>>>> 2> kegg-pathway.log2", the kegg-pathway.log1 said at the end [1].
>>>>>>> However, when I query in the postgres database using SQL language:
>>>>>>> "select * from genespathways", nothing returned.
>>>>>>>
>>>>>>> But when I do the same thing for malariamine after loading
>>>>>>> kegg-pathway
>>>>>>> data, I got the pathways to genes information as [2] listed. So I
>>>>>>> compared the log information between my mine and malariamine, and
>>>>>>> found
>>>>>>> my mine hasn't build several the indexes as [3] listed.
>>>>>>>
>>>>>>> Because I have used the same sources kegg-pathway as 
>>>>>>> malariamine, so
>>>>>>> what the problem here?
>>>>>>>
>>>>>>> Any suggestions and comments are welcom! Thanks a lot!
>>>>>>>
>>>>>>> Best,
>>>>>>> Pengcheng Yang
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------
>>>>>>> [1] build successful log information from my mine after load
>>>>>>> kegg-pathway
>>>>>>> /BUILD SUCCESSFUL//
>>>>>>> //Total time: 21 seconds//
>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP 
>>>>>>> pool
>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>> //[Thread-8] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP pool
>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP 
>>>>>>> pool
>>>>>>> db.production is being shutdown./
>>>>>>>
>>>>>>> [2] genespathways table from malariamine database.
>>>>>>>
>>>>>>>   pathways |  genes
>>>>>>> ----------+---------
>>>>>>>    2000002 | 1002796
>>>>>>>    2000002 | 1003874
>>>>>>>    2000002 | 1004075
>>>>>>>
>>>>>>> [3] the log information not appeared in my mine but in malariamine.
>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>> Gene__key_secondaryidentifier
>>>>>>> ON Gene (secondaryIdentifier, organismid)
>>>>>>>   [integrate] Creating index: CREATE INDEX Gene__key_symbol_org ON
>>>>>>> Gene
>>>>>>> (symbol, organismid)
>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>> Gene__key_primaryidentifier
>>>>>>> ON Gene (primaryIdentifier)
>>>>>>>   [integrate] Creating index: CREATE INDEX Organism__key_taxonid ON
>>>>>>> Organism (taxonId)
>>>>>>>   [integrate] Creating index: CREATE INDEX SOTerm__key ON SOTerm
>>>>>>> (name,
>>>>>>> ontologyid)
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> dev at intermine.org
>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>





More information about the dev mailing list