[InterMine Dev] kegg-pathway load: genespathways table is empty

Pengcheng Yang yangpc at biols.ac.cn
Mon Aug 3 13:35:46 BST 2015


Hi Julie Sullivan,

Thanks for your reply. The "org" and "1111" is coined for confidential 
reason.

I think I have found the cause of this problem following your suggestions.

I have set my taxonId to malaria 36329 in the project.xml and change my 
org_gene_map.tab file name to pfa_gene_map.tab, then load successfully. 
"select  count(*) from genespathways" also give the correct number of 
the gene2pathway pairs. It seems that the taxonId must exist in the the 
list of http://www.genome.jp/kegg/catalog/org_list.html. Unfortunately, 
The taxonId of my organism is not in the list. Is this the reason? How 
can I resolve this problem?

Best,
Pengcheng Yang


On 2015/8/3 17:24, Julie Sullivan wrote:
> "org" doesn't match any abbreviations kegg uses. Here is the full list:
>
>     http://www.genome.jp/kegg/catalog/org_list.html
>
> On 03/08/15 10:22, Pengcheng Yang wrote:
>>
>> The content of file
>> bio/sources/kegg-pathway/main/resources/kegg_config.properties:
>>
>> # configuration file that determines which fields are set in the kegg
>> converted
>> #
>> # <ORGANISM ABBR>.taxonId
>> # <ORGANISM ABBR>.identifier = which gene field to set
>> # if identifier is not set, primaryIdentifier will be used
>> #
>> # NOTE: this is only a configuration file.  to actually load organisms,
>> add them to the KEGG entry in project.xml
>> # See http://www.genome.jp/kegg/catalog/org_list.html for list of
>> organism abbreviations
>>
>>
>> # melanogaster
>> dme.taxonId = 7227
>> dme.identifier = primaryIdentifier
>>
>> # human
>> hsa.taxonId = 9606
>> hsa.identifier = symbol
>>
>> # mouse
>> mmu.taxonId = 10090
>> mmu.identifier = primaryIdentifier
>>
>> # rat
>> rno.taxonId = 10116
>>
>> # yeast
>> sce.taxonId = 4932
>>
>> # zebrafish
>> dre.taxonId = 7955
>>
>> # worm
>> cel.taxonId = 6239
>> cel.taxonId = 6239
>>
>> # malaria
>> pfa.taxonId = 36329
>>
>> #my org
>> org.taxonid = 1111
>>
>>
>> On 2015/8/3 17:15, Pengcheng Yang wrote:
>>> Sure,
>>>
>>> Content of the file
>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties:
>>>
>>> Organism.key_taxonid=taxonId
>>> DataSource.key_name=name
>>> Gene.key_primaryidentifier=primaryIdentifier
>>> Gene.key_symbol_org=symbol, organism
>>> Gene.key_secondaryidentifier=secondaryIdentifier, organism
>>> DataSet.key_title=name
>>> SOTerm.key=name, ontology
>>> Ontology.key_title=name
>>> pfa.taxonId = 36329
>>> org.taxonId = 1111
>>>
>>> Best,
>>> Pengcheng Yang
>>>
>>> On 2015/8/3 17:10, Julie Sullivan wrote:
>>>> Can you send me the configuration you added to the file?
>>>>
>>>> This is the list of KEGG organisms and associated abbreviations:
>>>>
>>>>     http://www.genome.jp/kegg/catalog/org_list.html
>>>>
>>>>
>>>> On 03/08/15 10:06, Pengcheng Yang wrote:
>>>>>
>>>>> Hi Chen Yian and Julie Sullivan,
>>>>>
>>>>> Thank you for your reply and the information.
>>>>>
>>>>> I have tried both the following two methods, the talbe 
>>>>> "genespathways"
>>>>> remains empty.
>>>>> 1) Adding the org.taxonId=1111 to the
>>>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties file.
>>>>> Here the taxonId and organism name were coined for confidential 
>>>>> reason.
>>>>> 2) remove the "kegg.organisms" property from project.xml file.
>>>>>
>>>>> I have checked the related information maybe useful:
>>>>> 1) "select  * from pathway" return expected information.
>>>>> 2) I compared the kegg-pathway.log1 file from mymine and malariamine
>>>>> and
>>>>> found the following that specific to mymine, not existed in
>>>>> malariamine's kegg-pathway.log1
>>>>>
>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java 
>>>>> added as
>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>> outdated.
>>>>> 3172,3173d3169
>>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java 
>>>>> added as
>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>> 3217c3213
>>>>> <     [javac]
>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>
>>>>>
>>>>>
>>>>> <     [javac]
>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegionShadow.java 
>>>>>
>>>>>
>>>>>
>>>>> 3352,3353d3345
>>>>> <     [javac]
>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegion.java 
>>>>>
>>>>>
>>>>>
>>>>> <     [javac]
>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>
>>>>>
>>>>>
>>>>> 3540,3541d3531
>>>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegion.class
>>>>> added as
>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>> outdated.
>>>>> 3543,3544d3532
>>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegion.class added as
>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegionShadow.class
>>>>> added as
>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>> <   [lib:jar] adding entry
>>>>> org/intermine/model/bio/ProteinDomainRegion.class
>>>>> <   [lib:jar] adding entry
>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>>> 3720,3721d3705
>>>>> <   [lib:jar] adding entry 
>>>>> org/intermine/model/bio/ProteinRegion.class
>>>>> <   [lib:jar] adding entry
>>>>> org/intermine/model/bio/ProteinRegionShadow.class
>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java 
>>>>> added as
>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>> outdated.
>>>>> 7230,7231d7211
>>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java 
>>>>> added as
>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>> 7275c7255
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Best,
>>>>> Pengcheng Yang
>>>>>
>>>>>
>>>>>
>>>>> On 2015/8/3 16:14, Julie Sullivan wrote:
>>>>>> Here are the docs on the kegg source:
>>>>>>
>>>>>> http://intermine.readthedocs.org/en/latest/database/data-sources/library/pathways/kegg/ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> KEGG uses its own prefix, which InterMine does not know. You have to
>>>>>> configure this in the config file.
>>>>>>
>>>>>> e.g. KEGG uses "dme" for Drosophila melanogaster and the data 
>>>>>> file is
>>>>>> named "dme_gene_map.tab".
>>>>>>
>>>>>> The reason why malaria worked is that is already configured:
>>>>>>
>>>>>> https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties#L37 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> You have two options:
>>>>>>
>>>>>> 1. remove the taxon ID from your project XML file, all genes will be
>>>>>> loaded
>>>>>>
>>>>>> 2. configure the taxon ID in the kegg_config.properties
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/08/15 08:55, Pengcheng Yang wrote:
>>>>>>> Hi Julie Sullivan,
>>>>>>>
>>>>>>> Thank you for your reply.
>>>>>>>
>>>>>>> I listed the kegg-pathway part of the project.xml file for the two
>>>>>>> mine.
>>>>>>> It seems they have no difference except the path and organisms.
>>>>>>>
>>>>>>> [1] The project.xml of my mine:
>>>>>>> ----------------------
>>>>>>> <source name="kegg-pathway" type="kegg-pathway">
>>>>>>>         <property name="kegg.organisms" value="1111"/>
>>>>>>>        <property name="src.data.dir"
>>>>>>> location="/path/to/mymine/kegg/"/>
>>>>>>>      </source>
>>>>>>>
>>>>>>> [2] The project.xml of malariamine
>>>>>>>      <source name="kegg-pathway" type="kegg-pathway">
>>>>>>>        <property name="kegg.organisms" value="36329"/>
>>>>>>>        <property name="src.data.dir"
>>>>>>> location="/path/to/malaria/kegg/"/>
>>>>>>>      </source>
>>>>>>>
>>>>>>> I have checked the file org_gene_map.tab file, its format indeed 
>>>>>>> is:
>>>>>>> GeneID<tb>mapid<space>mapid<space>mapid
>>>>>>>
>>>>>>> Best,
>>>>>>> Pengcheng Yang
>>>>>>>
>>>>>>> On 2015/8/3 15:32, Julie Sullivan wrote:
>>>>>>>> Sorry you are having problems with the kegg source!
>>>>>>>>
>>>>>>>> Can you clarify what is different about the two project XML files?
>>>>>>>>
>>>>>>>> On 02/08/15 10:01, Pengcheng Yang wrote:
>>>>>>>>> Hi InterMiner developers,
>>>>>>>>>
>>>>>>>>> Thank you all who answered my questions. Here is another
>>>>>>>>> question that
>>>>>>>>> blocked my way to deploy my InterMine.
>>>>>>>>>
>>>>>>>>> To load kegg-pathway data, I set the project.xml as that in
>>>>>>>>> malariamine
>>>>>>>>> and prepared the two files map_title.tab and org_gene_map.tab.
>>>>>>>>> When I
>>>>>>>>> load the data using "ant -Dsource=kegg-pathway -v 1>
>>>>>>>>> kegg-pathway.log1
>>>>>>>>> 2> kegg-pathway.log2", the kegg-pathway.log1 said at the end [1].
>>>>>>>>> However, when I query in the postgres database using SQL 
>>>>>>>>> language:
>>>>>>>>> "select * from genespathways", nothing returned.
>>>>>>>>>
>>>>>>>>> But when I do the same thing for malariamine after loading
>>>>>>>>> kegg-pathway
>>>>>>>>> data, I got the pathways to genes information as [2] listed. So I
>>>>>>>>> compared the log information between my mine and malariamine, and
>>>>>>>>> found
>>>>>>>>> my mine hasn't build several the indexes as [3] listed.
>>>>>>>>>
>>>>>>>>> Because I have used the same sources kegg-pathway as
>>>>>>>>> malariamine, so
>>>>>>>>> what the problem here?
>>>>>>>>>
>>>>>>>>> Any suggestions and comments are welcom! Thanks a lot!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Pengcheng Yang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------
>>>>>>>>> [1] build successful log information from my mine after load
>>>>>>>>> kegg-pathway
>>>>>>>>> /BUILD SUCCESSFUL//
>>>>>>>>> //Total time: 21 seconds//
>>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>>> pool
>>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>>> //[Thread-8] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP 
>>>>>>>>> pool
>>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>>> pool
>>>>>>>>> db.production is being shutdown./
>>>>>>>>>
>>>>>>>>> [2] genespathways table from malariamine database.
>>>>>>>>>
>>>>>>>>>   pathways |  genes
>>>>>>>>> ----------+---------
>>>>>>>>>    2000002 | 1002796
>>>>>>>>>    2000002 | 1003874
>>>>>>>>>    2000002 | 1004075
>>>>>>>>>
>>>>>>>>> [3] the log information not appeared in my mine but in 
>>>>>>>>> malariamine.
>>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>>> Gene__key_secondaryidentifier
>>>>>>>>> ON Gene (secondaryIdentifier, organismid)
>>>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>>>> Gene__key_symbol_org ON
>>>>>>>>> Gene
>>>>>>>>> (symbol, organismid)
>>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>>> Gene__key_primaryidentifier
>>>>>>>>> ON Gene (primaryIdentifier)
>>>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>>>> Organism__key_taxonid ON
>>>>>>>>> Organism (taxonId)
>>>>>>>>>   [integrate] Creating index: CREATE INDEX SOTerm__key ON SOTerm
>>>>>>>>> (name,
>>>>>>>>> ontologyid)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dev mailing list
>>>>>>>>> dev at intermine.org
>>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at intermine.org
>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>
>>
>>
>>
>





More information about the dev mailing list