[InterMine Dev] kegg-pathway load: genespathways table is empty

Pengcheng Yang yangpc at biols.ac.cn
Tue Aug 4 02:25:20 BST 2015


Hi Chen Yian,

Thank you for your information.

Indeed, the error was caused by the "taxonid" should be "taxonId". I 
have tested and loaded successfully.

Thank you for your help!

Best,
Pengcheng Yang


On 2015/8/4 9:23, Chen, Yian wrote:
> Hi Pengcheng Yang,
>
> I don't think the abbreviation matter.
> As long as in the beginning of your file name the 3-letter 
> abbreviation is the same as the one you set in the 
> "bio/sources/kegg-pathway/main/resources/kegg_config.properties", the 
> integration should be fine.
>
> Can you check if you have typo in your configuration file?
> I saw "org.taxonid = 1111 " in your previous mail and it should be 
> "org.taxonId = 1111", capital "I".
>
> Best,
>
> Chen
>
>
> On 2015/08/03 21:35, Pengcheng Yang wrote:
>> Hi Julie Sullivan,
>>
>> Thanks for your reply. The "org" and "1111" is coined for 
>> confidential reason.
>>
>> I think I have found the cause of this problem following your 
>> suggestions.
>>
>> I have set my taxonId to malaria 36329 in the project.xml and change 
>> my org_gene_map.tab file name to pfa_gene_map.tab, then load 
>> successfully. "select  count(*) from genespathways" also give the 
>> correct number of the gene2pathway pairs. It seems that the taxonId 
>> must exist in the the list of 
>> http://www.genome.jp/kegg/catalog/org_list.html. Unfortunately, The 
>> taxonId of my organism is not in the list. Is this the reason? How 
>> can I resolve this problem?
>>
>> Best,
>> Pengcheng Yang
>>
>>
>> On 2015/8/3 17:24, Julie Sullivan wrote:
>>> "org" doesn't match any abbreviations kegg uses. Here is the full list:
>>>
>>>     http://www.genome.jp/kegg/catalog/org_list.html
>>>
>>> On 03/08/15 10:22, Pengcheng Yang wrote:
>>>>
>>>> The content of file
>>>> bio/sources/kegg-pathway/main/resources/kegg_config.properties:
>>>>
>>>> # configuration file that determines which fields are set in the kegg
>>>> converted
>>>> #
>>>> # <ORGANISM ABBR>.taxonId
>>>> # <ORGANISM ABBR>.identifier = which gene field to set
>>>> # if identifier is not set, primaryIdentifier will be used
>>>> #
>>>> # NOTE: this is only a configuration file.  to actually load 
>>>> organisms,
>>>> add them to the KEGG entry in project.xml
>>>> # See http://www.genome.jp/kegg/catalog/org_list.html for list of
>>>> organism abbreviations
>>>>
>>>>
>>>> # melanogaster
>>>> dme.taxonId = 7227
>>>> dme.identifier = primaryIdentifier
>>>>
>>>> # human
>>>> hsa.taxonId = 9606
>>>> hsa.identifier = symbol
>>>>
>>>> # mouse
>>>> mmu.taxonId = 10090
>>>> mmu.identifier = primaryIdentifier
>>>>
>>>> # rat
>>>> rno.taxonId = 10116
>>>>
>>>> # yeast
>>>> sce.taxonId = 4932
>>>>
>>>> # zebrafish
>>>> dre.taxonId = 7955
>>>>
>>>> # worm
>>>> cel.taxonId = 6239
>>>> cel.taxonId = 6239
>>>>
>>>> # malaria
>>>> pfa.taxonId = 36329
>>>>
>>>> #my org
>>>> org.taxonid = 1111
>>>>
>>>>
>>>> On 2015/8/3 17:15, Pengcheng Yang wrote:
>>>>> Sure,
>>>>>
>>>>> Content of the file
>>>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties:
>>>>>
>>>>> Organism.key_taxonid=taxonId
>>>>> DataSource.key_name=name
>>>>> Gene.key_primaryidentifier=primaryIdentifier
>>>>> Gene.key_symbol_org=symbol, organism
>>>>> Gene.key_secondaryidentifier=secondaryIdentifier, organism
>>>>> DataSet.key_title=name
>>>>> SOTerm.key=name, ontology
>>>>> Ontology.key_title=name
>>>>> pfa.taxonId = 36329
>>>>> org.taxonId = 1111
>>>>>
>>>>> Best,
>>>>> Pengcheng Yang
>>>>>
>>>>> On 2015/8/3 17:10, Julie Sullivan wrote:
>>>>>> Can you send me the configuration you added to the file?
>>>>>>
>>>>>> This is the list of KEGG organisms and associated abbreviations:
>>>>>>
>>>>>>     http://www.genome.jp/kegg/catalog/org_list.html
>>>>>>
>>>>>>
>>>>>> On 03/08/15 10:06, Pengcheng Yang wrote:
>>>>>>>
>>>>>>> Hi Chen Yian and Julie Sullivan,
>>>>>>>
>>>>>>> Thank you for your reply and the information.
>>>>>>>
>>>>>>> I have tried both the following two methods, the talbe 
>>>>>>> "genespathways"
>>>>>>> remains empty.
>>>>>>> 1) Adding the org.taxonId=1111 to the
>>>>>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties 
>>>>>>> file.
>>>>>>> Here the taxonId and organism name were coined for confidential 
>>>>>>> reason.
>>>>>>> 2) remove the "kegg.organisms" property from project.xml file.
>>>>>>>
>>>>>>> I have checked the related information maybe useful:
>>>>>>> 1) "select  * from pathway" return expected information.
>>>>>>> 2) I compared the kegg-pathway.log1 file from mymine and 
>>>>>>> malariamine
>>>>>>> and
>>>>>>> found the following that specific to mymine, not existed in
>>>>>>> malariamine's kegg-pathway.log1
>>>>>>>
>>>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>>> added as
>>>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>>>> <     [javac] 
>>>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>>>> outdated.
>>>>>>> 3172,3173d3169
>>>>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>>> added as
>>>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>>>> 3217c3213
>>>>>>> <     [javac]
>>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <     [javac]
>>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegionShadow.java 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3352,3353d3345
>>>>>>> <     [javac]
>>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegion.java 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <     [javac]
>>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3540,3541d3531
>>>>>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegion.class
>>>>>>> added as
>>>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>>>> <   [lib:jar] 
>>>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>>>> outdated.
>>>>>>> 3543,3544d3532
>>>>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegion.class added as
>>>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegionShadow.class
>>>>>>> added as
>>>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>>>> <   [lib:jar] adding entry
>>>>>>> org/intermine/model/bio/ProteinDomainRegion.class
>>>>>>> <   [lib:jar] adding entry
>>>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>>>>> 3720,3721d3705
>>>>>>> <   [lib:jar] adding entry 
>>>>>>> org/intermine/model/bio/ProteinRegion.class
>>>>>>> <   [lib:jar] adding entry
>>>>>>> org/intermine/model/bio/ProteinRegionShadow.class
>>>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>>> added as
>>>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>>>> <     [javac] 
>>>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>>>> outdated.
>>>>>>> 7230,7231d7211
>>>>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>>> added as
>>>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>>>> 7275c7255
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>> Best,
>>>>>>> Pengcheng Yang
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2015/8/3 16:14, Julie Sullivan wrote:
>>>>>>>> Here are the docs on the kegg source:
>>>>>>>>
>>>>>>>> http://intermine.readthedocs.org/en/latest/database/data-sources/library/pathways/kegg/ 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> KEGG uses its own prefix, which InterMine does not know. You 
>>>>>>>> have to
>>>>>>>> configure this in the config file.
>>>>>>>>
>>>>>>>> e.g. KEGG uses "dme" for Drosophila melanogaster and the data 
>>>>>>>> file is
>>>>>>>> named "dme_gene_map.tab".
>>>>>>>>
>>>>>>>> The reason why malaria worked is that is already configured:
>>>>>>>>
>>>>>>>> https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties#L37 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> You have two options:
>>>>>>>>
>>>>>>>> 1. remove the taxon ID from your project XML file, all genes 
>>>>>>>> will be
>>>>>>>> loaded
>>>>>>>>
>>>>>>>> 2. configure the taxon ID in the kegg_config.properties
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/08/15 08:55, Pengcheng Yang wrote:
>>>>>>>>> Hi Julie Sullivan,
>>>>>>>>>
>>>>>>>>> Thank you for your reply.
>>>>>>>>>
>>>>>>>>> I listed the kegg-pathway part of the project.xml file for the 
>>>>>>>>> two
>>>>>>>>> mine.
>>>>>>>>> It seems they have no difference except the path and organisms.
>>>>>>>>>
>>>>>>>>> [1] The project.xml of my mine:
>>>>>>>>> ----------------------
>>>>>>>>> <source name="kegg-pathway" type="kegg-pathway">
>>>>>>>>>         <property name="kegg.organisms" value="1111"/>
>>>>>>>>>        <property name="src.data.dir"
>>>>>>>>> location="/path/to/mymine/kegg/"/>
>>>>>>>>>      </source>
>>>>>>>>>
>>>>>>>>> [2] The project.xml of malariamine
>>>>>>>>>      <source name="kegg-pathway" type="kegg-pathway">
>>>>>>>>>        <property name="kegg.organisms" value="36329"/>
>>>>>>>>>        <property name="src.data.dir"
>>>>>>>>> location="/path/to/malaria/kegg/"/>
>>>>>>>>>      </source>
>>>>>>>>>
>>>>>>>>> I have checked the file org_gene_map.tab file, its format 
>>>>>>>>> indeed is:
>>>>>>>>> GeneID<tb>mapid<space>mapid<space>mapid
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Pengcheng Yang
>>>>>>>>>
>>>>>>>>> On 2015/8/3 15:32, Julie Sullivan wrote:
>>>>>>>>>> Sorry you are having problems with the kegg source!
>>>>>>>>>>
>>>>>>>>>> Can you clarify what is different about the two project XML 
>>>>>>>>>> files?
>>>>>>>>>>
>>>>>>>>>> On 02/08/15 10:01, Pengcheng Yang wrote:
>>>>>>>>>>> Hi InterMiner developers,
>>>>>>>>>>>
>>>>>>>>>>> Thank you all who answered my questions. Here is another
>>>>>>>>>>> question that
>>>>>>>>>>> blocked my way to deploy my InterMine.
>>>>>>>>>>>
>>>>>>>>>>> To load kegg-pathway data, I set the project.xml as that in
>>>>>>>>>>> malariamine
>>>>>>>>>>> and prepared the two files map_title.tab and org_gene_map.tab.
>>>>>>>>>>> When I
>>>>>>>>>>> load the data using "ant -Dsource=kegg-pathway -v 1>
>>>>>>>>>>> kegg-pathway.log1
>>>>>>>>>>> 2> kegg-pathway.log2", the kegg-pathway.log1 said at the end 
>>>>>>>>>>> [1].
>>>>>>>>>>> However, when I query in the postgres database using SQL 
>>>>>>>>>>> language:
>>>>>>>>>>> "select * from genespathways", nothing returned.
>>>>>>>>>>>
>>>>>>>>>>> But when I do the same thing for malariamine after loading
>>>>>>>>>>> kegg-pathway
>>>>>>>>>>> data, I got the pathways to genes information as [2] listed. 
>>>>>>>>>>> So I
>>>>>>>>>>> compared the log information between my mine and 
>>>>>>>>>>> malariamine, and
>>>>>>>>>>> found
>>>>>>>>>>> my mine hasn't build several the indexes as [3] listed.
>>>>>>>>>>>
>>>>>>>>>>> Because I have used the same sources kegg-pathway as
>>>>>>>>>>> malariamine, so
>>>>>>>>>>> what the problem here?
>>>>>>>>>>>
>>>>>>>>>>> Any suggestions and comments are welcom! Thanks a lot!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Pengcheng Yang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------
>>>>>>>>>>> [1] build successful log information from my mine after load
>>>>>>>>>>> kegg-pathway
>>>>>>>>>>> /BUILD SUCCESSFUL//
>>>>>>>>>>> //Total time: 21 seconds//
>>>>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>>>>> pool
>>>>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>>>>> //[Thread-8] INFO com.zaxxer.hikari.pool.HikariPool - 
>>>>>>>>>>> HikariCP pool
>>>>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>>>>> pool
>>>>>>>>>>> db.production is being shutdown./
>>>>>>>>>>>
>>>>>>>>>>> [2] genespathways table from malariamine database.
>>>>>>>>>>>
>>>>>>>>>>>   pathways |  genes
>>>>>>>>>>> ----------+---------
>>>>>>>>>>>    2000002 | 1002796
>>>>>>>>>>>    2000002 | 1003874
>>>>>>>>>>>    2000002 | 1004075
>>>>>>>>>>>
>>>>>>>>>>> [3] the log information not appeared in my mine but in 
>>>>>>>>>>> malariamine.
>>>>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>>>>> Gene__key_secondaryidentifier
>>>>>>>>>>> ON Gene (secondaryIdentifier, organismid)
>>>>>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>>>>>> Gene__key_symbol_org ON
>>>>>>>>>>> Gene
>>>>>>>>>>> (symbol, organismid)
>>>>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>>>>> Gene__key_primaryidentifier
>>>>>>>>>>> ON Gene (primaryIdentifier)
>>>>>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>>>>>> Organism__key_taxonid ON
>>>>>>>>>>> Organism (taxonId)
>>>>>>>>>>>   [integrate] Creating index: CREATE INDEX SOTerm__key ON 
>>>>>>>>>>> SOTerm
>>>>>>>>>>> (name,
>>>>>>>>>>> ontologyid)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dev mailing list
>>>>>>>>>>> dev at intermine.org
>>>>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at intermine.org
>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
>





More information about the dev mailing list