[InterMine Dev] kegg-pathway load: genespathways table is empty

Chen, Yian chenyian at nibiohn.go.jp
Tue Aug 4 02:23:08 BST 2015


Hi Pengcheng Yang,

I don't think the abbreviation matter.
As long as in the beginning of your file name the 3-letter abbreviation 
is the same as the one you set in the 
"bio/sources/kegg-pathway/main/resources/kegg_config.properties", the 
integration should be fine.

Can you check if you have typo in your configuration file?
I saw "org.taxonid = 1111 " in your previous mail and it should be 
"org.taxonId = 1111", capital "I".

Best,

Chen


On 2015/08/03 21:35, Pengcheng Yang wrote:
> Hi Julie Sullivan,
>
> Thanks for your reply. The "org" and "1111" is coined for confidential 
> reason.
>
> I think I have found the cause of this problem following your 
> suggestions.
>
> I have set my taxonId to malaria 36329 in the project.xml and change 
> my org_gene_map.tab file name to pfa_gene_map.tab, then load 
> successfully. "select  count(*) from genespathways" also give the 
> correct number of the gene2pathway pairs. It seems that the taxonId 
> must exist in the the list of 
> http://www.genome.jp/kegg/catalog/org_list.html. Unfortunately, The 
> taxonId of my organism is not in the list. Is this the reason? How can 
> I resolve this problem?
>
> Best,
> Pengcheng Yang
>
>
> On 2015/8/3 17:24, Julie Sullivan wrote:
>> "org" doesn't match any abbreviations kegg uses. Here is the full list:
>>
>>     http://www.genome.jp/kegg/catalog/org_list.html
>>
>> On 03/08/15 10:22, Pengcheng Yang wrote:
>>>
>>> The content of file
>>> bio/sources/kegg-pathway/main/resources/kegg_config.properties:
>>>
>>> # configuration file that determines which fields are set in the kegg
>>> converted
>>> #
>>> # <ORGANISM ABBR>.taxonId
>>> # <ORGANISM ABBR>.identifier = which gene field to set
>>> # if identifier is not set, primaryIdentifier will be used
>>> #
>>> # NOTE: this is only a configuration file.  to actually load organisms,
>>> add them to the KEGG entry in project.xml
>>> # See http://www.genome.jp/kegg/catalog/org_list.html for list of
>>> organism abbreviations
>>>
>>>
>>> # melanogaster
>>> dme.taxonId = 7227
>>> dme.identifier = primaryIdentifier
>>>
>>> # human
>>> hsa.taxonId = 9606
>>> hsa.identifier = symbol
>>>
>>> # mouse
>>> mmu.taxonId = 10090
>>> mmu.identifier = primaryIdentifier
>>>
>>> # rat
>>> rno.taxonId = 10116
>>>
>>> # yeast
>>> sce.taxonId = 4932
>>>
>>> # zebrafish
>>> dre.taxonId = 7955
>>>
>>> # worm
>>> cel.taxonId = 6239
>>> cel.taxonId = 6239
>>>
>>> # malaria
>>> pfa.taxonId = 36329
>>>
>>> #my org
>>> org.taxonid = 1111
>>>
>>>
>>> On 2015/8/3 17:15, Pengcheng Yang wrote:
>>>> Sure,
>>>>
>>>> Content of the file
>>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties:
>>>>
>>>> Organism.key_taxonid=taxonId
>>>> DataSource.key_name=name
>>>> Gene.key_primaryidentifier=primaryIdentifier
>>>> Gene.key_symbol_org=symbol, organism
>>>> Gene.key_secondaryidentifier=secondaryIdentifier, organism
>>>> DataSet.key_title=name
>>>> SOTerm.key=name, ontology
>>>> Ontology.key_title=name
>>>> pfa.taxonId = 36329
>>>> org.taxonId = 1111
>>>>
>>>> Best,
>>>> Pengcheng Yang
>>>>
>>>> On 2015/8/3 17:10, Julie Sullivan wrote:
>>>>> Can you send me the configuration you added to the file?
>>>>>
>>>>> This is the list of KEGG organisms and associated abbreviations:
>>>>>
>>>>>     http://www.genome.jp/kegg/catalog/org_list.html
>>>>>
>>>>>
>>>>> On 03/08/15 10:06, Pengcheng Yang wrote:
>>>>>>
>>>>>> Hi Chen Yian and Julie Sullivan,
>>>>>>
>>>>>> Thank you for your reply and the information.
>>>>>>
>>>>>> I have tried both the following two methods, the talbe 
>>>>>> "genespathways"
>>>>>> remains empty.
>>>>>> 1) Adding the org.taxonId=1111 to the
>>>>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties 
>>>>>> file.
>>>>>> Here the taxonId and organism name were coined for confidential 
>>>>>> reason.
>>>>>> 2) remove the "kegg.organisms" property from project.xml file.
>>>>>>
>>>>>> I have checked the related information maybe useful:
>>>>>> 1) "select  * from pathway" return expected information.
>>>>>> 2) I compared the kegg-pathway.log1 file from mymine and malariamine
>>>>>> and
>>>>>> found the following that specific to mymine, not existed in
>>>>>> malariamine's kegg-pathway.log1
>>>>>>
>>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>> added as
>>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>>> outdated.
>>>>>> 3172,3173d3169
>>>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>> added as
>>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>>> 3217c3213
>>>>>> <     [javac]
>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>>
>>>>>>
>>>>>>
>>>>>> <     [javac]
>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegionShadow.java 
>>>>>>
>>>>>>
>>>>>>
>>>>>> 3352,3353d3345
>>>>>> <     [javac]
>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegion.java 
>>>>>>
>>>>>>
>>>>>>
>>>>>> <     [javac]
>>>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>>
>>>>>>
>>>>>>
>>>>>> 3540,3541d3531
>>>>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegion.class
>>>>>> added as
>>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>>> <   [lib:jar] 
>>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>>> outdated.
>>>>>> 3543,3544d3532
>>>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegion.class added as
>>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegionShadow.class
>>>>>> added as
>>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>>> <   [lib:jar] adding entry
>>>>>> org/intermine/model/bio/ProteinDomainRegion.class
>>>>>> <   [lib:jar] adding entry
>>>>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>>>> 3720,3721d3705
>>>>>> <   [lib:jar] adding entry 
>>>>>> org/intermine/model/bio/ProteinRegion.class
>>>>>> <   [lib:jar] adding entry
>>>>>> org/intermine/model/bio/ProteinRegionShadow.class
>>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java 
>>>>>> added as
>>>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>>>> outdated.
>>>>>> 7230,7231d7211
>>>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java 
>>>>>> added as
>>>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>>>> 7275c7255
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>> Best,
>>>>>> Pengcheng Yang
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2015/8/3 16:14, Julie Sullivan wrote:
>>>>>>> Here are the docs on the kegg source:
>>>>>>>
>>>>>>> http://intermine.readthedocs.org/en/latest/database/data-sources/library/pathways/kegg/ 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> KEGG uses its own prefix, which InterMine does not know. You 
>>>>>>> have to
>>>>>>> configure this in the config file.
>>>>>>>
>>>>>>> e.g. KEGG uses "dme" for Drosophila melanogaster and the data 
>>>>>>> file is
>>>>>>> named "dme_gene_map.tab".
>>>>>>>
>>>>>>> The reason why malaria worked is that is already configured:
>>>>>>>
>>>>>>> https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties#L37 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You have two options:
>>>>>>>
>>>>>>> 1. remove the taxon ID from your project XML file, all genes 
>>>>>>> will be
>>>>>>> loaded
>>>>>>>
>>>>>>> 2. configure the taxon ID in the kegg_config.properties
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 03/08/15 08:55, Pengcheng Yang wrote:
>>>>>>>> Hi Julie Sullivan,
>>>>>>>>
>>>>>>>> Thank you for your reply.
>>>>>>>>
>>>>>>>> I listed the kegg-pathway part of the project.xml file for the two
>>>>>>>> mine.
>>>>>>>> It seems they have no difference except the path and organisms.
>>>>>>>>
>>>>>>>> [1] The project.xml of my mine:
>>>>>>>> ----------------------
>>>>>>>> <source name="kegg-pathway" type="kegg-pathway">
>>>>>>>>         <property name="kegg.organisms" value="1111"/>
>>>>>>>>        <property name="src.data.dir"
>>>>>>>> location="/path/to/mymine/kegg/"/>
>>>>>>>>      </source>
>>>>>>>>
>>>>>>>> [2] The project.xml of malariamine
>>>>>>>>      <source name="kegg-pathway" type="kegg-pathway">
>>>>>>>>        <property name="kegg.organisms" value="36329"/>
>>>>>>>>        <property name="src.data.dir"
>>>>>>>> location="/path/to/malaria/kegg/"/>
>>>>>>>>      </source>
>>>>>>>>
>>>>>>>> I have checked the file org_gene_map.tab file, its format 
>>>>>>>> indeed is:
>>>>>>>> GeneID<tb>mapid<space>mapid<space>mapid
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Pengcheng Yang
>>>>>>>>
>>>>>>>> On 2015/8/3 15:32, Julie Sullivan wrote:
>>>>>>>>> Sorry you are having problems with the kegg source!
>>>>>>>>>
>>>>>>>>> Can you clarify what is different about the two project XML 
>>>>>>>>> files?
>>>>>>>>>
>>>>>>>>> On 02/08/15 10:01, Pengcheng Yang wrote:
>>>>>>>>>> Hi InterMiner developers,
>>>>>>>>>>
>>>>>>>>>> Thank you all who answered my questions. Here is another
>>>>>>>>>> question that
>>>>>>>>>> blocked my way to deploy my InterMine.
>>>>>>>>>>
>>>>>>>>>> To load kegg-pathway data, I set the project.xml as that in
>>>>>>>>>> malariamine
>>>>>>>>>> and prepared the two files map_title.tab and org_gene_map.tab.
>>>>>>>>>> When I
>>>>>>>>>> load the data using "ant -Dsource=kegg-pathway -v 1>
>>>>>>>>>> kegg-pathway.log1
>>>>>>>>>> 2> kegg-pathway.log2", the kegg-pathway.log1 said at the end 
>>>>>>>>>> [1].
>>>>>>>>>> However, when I query in the postgres database using SQL 
>>>>>>>>>> language:
>>>>>>>>>> "select * from genespathways", nothing returned.
>>>>>>>>>>
>>>>>>>>>> But when I do the same thing for malariamine after loading
>>>>>>>>>> kegg-pathway
>>>>>>>>>> data, I got the pathways to genes information as [2] listed. 
>>>>>>>>>> So I
>>>>>>>>>> compared the log information between my mine and malariamine, 
>>>>>>>>>> and
>>>>>>>>>> found
>>>>>>>>>> my mine hasn't build several the indexes as [3] listed.
>>>>>>>>>>
>>>>>>>>>> Because I have used the same sources kegg-pathway as
>>>>>>>>>> malariamine, so
>>>>>>>>>> what the problem here?
>>>>>>>>>>
>>>>>>>>>> Any suggestions and comments are welcom! Thanks a lot!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Pengcheng Yang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------
>>>>>>>>>> [1] build successful log information from my mine after load
>>>>>>>>>> kegg-pathway
>>>>>>>>>> /BUILD SUCCESSFUL//
>>>>>>>>>> //Total time: 21 seconds//
>>>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>>>> pool
>>>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>>>> //[Thread-8] INFO com.zaxxer.hikari.pool.HikariPool - 
>>>>>>>>>> HikariCP pool
>>>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>>>> pool
>>>>>>>>>> db.production is being shutdown./
>>>>>>>>>>
>>>>>>>>>> [2] genespathways table from malariamine database.
>>>>>>>>>>
>>>>>>>>>>   pathways |  genes
>>>>>>>>>> ----------+---------
>>>>>>>>>>    2000002 | 1002796
>>>>>>>>>>    2000002 | 1003874
>>>>>>>>>>    2000002 | 1004075
>>>>>>>>>>
>>>>>>>>>> [3] the log information not appeared in my mine but in 
>>>>>>>>>> malariamine.
>>>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>>>> Gene__key_secondaryidentifier
>>>>>>>>>> ON Gene (secondaryIdentifier, organismid)
>>>>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>>>>> Gene__key_symbol_org ON
>>>>>>>>>> Gene
>>>>>>>>>> (symbol, organismid)
>>>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>>>> Gene__key_primaryidentifier
>>>>>>>>>> ON Gene (primaryIdentifier)
>>>>>>>>>>   [integrate] Creating index: CREATE INDEX 
>>>>>>>>>> Organism__key_taxonid ON
>>>>>>>>>> Organism (taxonId)
>>>>>>>>>>   [integrate] Creating index: CREATE INDEX SOTerm__key ON SOTerm
>>>>>>>>>> (name,
>>>>>>>>>> ontologyid)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dev mailing list
>>>>>>>>>> dev at intermine.org
>>>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at intermine.org
>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>
>>>
>>>
>>
>
>
>
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev




More information about the dev mailing list