[InterMine Dev] kegg-pathway load: genespathways table is empty

Julie Sullivan julie at flymine.org
Mon Aug 3 10:24:34 BST 2015


"org" doesn't match any abbreviations kegg uses. Here is the full list:

	http://www.genome.jp/kegg/catalog/org_list.html

On 03/08/15 10:22, Pengcheng Yang wrote:
>
> The content of file
> bio/sources/kegg-pathway/main/resources/kegg_config.properties:
>
> # configuration file that determines which fields are set in the kegg
> converted
> #
> # <ORGANISM ABBR>.taxonId
> # <ORGANISM ABBR>.identifier = which gene field to set
> # if identifier is not set, primaryIdentifier will be used
> #
> # NOTE: this is only a configuration file.  to actually load organisms,
> add them to the KEGG entry in project.xml
> # See http://www.genome.jp/kegg/catalog/org_list.html for list of
> organism abbreviations
>
>
> # melanogaster
> dme.taxonId = 7227
> dme.identifier = primaryIdentifier
>
> # human
> hsa.taxonId = 9606
> hsa.identifier = symbol
>
> # mouse
> mmu.taxonId = 10090
> mmu.identifier = primaryIdentifier
>
> # rat
> rno.taxonId = 10116
>
> # yeast
> sce.taxonId = 4932
>
> # zebrafish
> dre.taxonId = 7955
>
> # worm
> cel.taxonId = 6239
> cel.taxonId = 6239
>
> # malaria
> pfa.taxonId = 36329
>
> #my org
> org.taxonid = 1111
>
>
> On 2015/8/3 17:15, Pengcheng Yang wrote:
>> Sure,
>>
>> Content of the file
>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties:
>>
>> Organism.key_taxonid=taxonId
>> DataSource.key_name=name
>> Gene.key_primaryidentifier=primaryIdentifier
>> Gene.key_symbol_org=symbol, organism
>> Gene.key_secondaryidentifier=secondaryIdentifier, organism
>> DataSet.key_title=name
>> SOTerm.key=name, ontology
>> Ontology.key_title=name
>> pfa.taxonId = 36329
>> org.taxonId = 1111
>>
>> Best,
>> Pengcheng Yang
>>
>> On 2015/8/3 17:10, Julie Sullivan wrote:
>>> Can you send me the configuration you added to the file?
>>>
>>> This is the list of KEGG organisms and associated abbreviations:
>>>
>>>     http://www.genome.jp/kegg/catalog/org_list.html
>>>
>>>
>>> On 03/08/15 10:06, Pengcheng Yang wrote:
>>>>
>>>> Hi Chen Yian and Julie Sullivan,
>>>>
>>>> Thank you for your reply and the information.
>>>>
>>>> I have tried both the following two methods, the talbe "genespathways"
>>>> remains empty.
>>>> 1) Adding the org.taxonId=1111 to the
>>>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties file.
>>>> Here the taxonId and organism name were coined for confidential reason.
>>>> 2) remove the "kegg.organisms" property from project.xml file.
>>>>
>>>> I have checked the related information maybe useful:
>>>> 1) "select  * from pathway" return expected information.
>>>> 2) I compared the kegg-pathway.log1 file from mymine and malariamine
>>>> and
>>>> found the following that specific to mymine, not existed in
>>>> malariamine's kegg-pathway.log1
>>>>
>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java added as
>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>> outdated.
>>>> 3172,3173d3169
>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java added as
>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>> 3217c3213
>>>> <     [javac]
>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegion.java
>>>>
>>>>
>>>> <     [javac]
>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>>
>>>>
>>>> 3352,3353d3345
>>>> <     [javac]
>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegion.java
>>>>
>>>>
>>>> <     [javac]
>>>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegionShadow.java
>>>>
>>>>
>>>> 3540,3541d3531
>>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegion.class
>>>> added as
>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>> outdated.
>>>> 3543,3544d3532
>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegion.class added as
>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>> <   [lib:jar] org/intermine/model/bio/ProteinRegionShadow.class
>>>> added as
>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>> <   [lib:jar] adding entry
>>>> org/intermine/model/bio/ProteinDomainRegion.class
>>>> <   [lib:jar] adding entry
>>>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>>>> 3720,3721d3705
>>>> <   [lib:jar] adding entry org/intermine/model/bio/ProteinRegion.class
>>>> <   [lib:jar] adding entry
>>>> org/intermine/model/bio/ProteinRegionShadow.class
>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java added as
>>>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>>>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>>>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>>>> outdated.
>>>> 7230,7231d7211
>>>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>>>> org/intermine/model/bio/ProteinRegion.class is outdated.
>>>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java added as
>>>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>>>> 7275c7255
>>>>
>>>> Thanks a lot!
>>>>
>>>> Best,
>>>> Pengcheng Yang
>>>>
>>>>
>>>>
>>>> On 2015/8/3 16:14, Julie Sullivan wrote:
>>>>> Here are the docs on the kegg source:
>>>>>
>>>>> http://intermine.readthedocs.org/en/latest/database/data-sources/library/pathways/kegg/
>>>>>
>>>>>
>>>>>
>>>>> KEGG uses its own prefix, which InterMine does not know. You have to
>>>>> configure this in the config file.
>>>>>
>>>>> e.g. KEGG uses "dme" for Drosophila melanogaster and the data file is
>>>>> named "dme_gene_map.tab".
>>>>>
>>>>> The reason why malaria worked is that is already configured:
>>>>>
>>>>> https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties#L37
>>>>>
>>>>>
>>>>>
>>>>> You have two options:
>>>>>
>>>>> 1. remove the taxon ID from your project XML file, all genes will be
>>>>> loaded
>>>>>
>>>>> 2. configure the taxon ID in the kegg_config.properties
>>>>>
>>>>>
>>>>>
>>>>> On 03/08/15 08:55, Pengcheng Yang wrote:
>>>>>> Hi Julie Sullivan,
>>>>>>
>>>>>> Thank you for your reply.
>>>>>>
>>>>>> I listed the kegg-pathway part of the project.xml file for the two
>>>>>> mine.
>>>>>> It seems they have no difference except the path and organisms.
>>>>>>
>>>>>> [1] The project.xml of my mine:
>>>>>> ----------------------
>>>>>> <source name="kegg-pathway" type="kegg-pathway">
>>>>>>         <property name="kegg.organisms" value="1111"/>
>>>>>>        <property name="src.data.dir"
>>>>>> location="/path/to/mymine/kegg/"/>
>>>>>>      </source>
>>>>>>
>>>>>> [2] The project.xml of malariamine
>>>>>>      <source name="kegg-pathway" type="kegg-pathway">
>>>>>>        <property name="kegg.organisms" value="36329"/>
>>>>>>        <property name="src.data.dir"
>>>>>> location="/path/to/malaria/kegg/"/>
>>>>>>      </source>
>>>>>>
>>>>>> I have checked the file org_gene_map.tab file, its format indeed is:
>>>>>> GeneID<tb>mapid<space>mapid<space>mapid
>>>>>>
>>>>>> Best,
>>>>>> Pengcheng Yang
>>>>>>
>>>>>> On 2015/8/3 15:32, Julie Sullivan wrote:
>>>>>>> Sorry you are having problems with the kegg source!
>>>>>>>
>>>>>>> Can you clarify what is different about the two project XML files?
>>>>>>>
>>>>>>> On 02/08/15 10:01, Pengcheng Yang wrote:
>>>>>>>> Hi InterMiner developers,
>>>>>>>>
>>>>>>>> Thank you all who answered my questions. Here is another
>>>>>>>> question that
>>>>>>>> blocked my way to deploy my InterMine.
>>>>>>>>
>>>>>>>> To load kegg-pathway data, I set the project.xml as that in
>>>>>>>> malariamine
>>>>>>>> and prepared the two files map_title.tab and org_gene_map.tab.
>>>>>>>> When I
>>>>>>>> load the data using "ant -Dsource=kegg-pathway -v 1>
>>>>>>>> kegg-pathway.log1
>>>>>>>> 2> kegg-pathway.log2", the kegg-pathway.log1 said at the end [1].
>>>>>>>> However, when I query in the postgres database using SQL language:
>>>>>>>> "select * from genespathways", nothing returned.
>>>>>>>>
>>>>>>>> But when I do the same thing for malariamine after loading
>>>>>>>> kegg-pathway
>>>>>>>> data, I got the pathways to genes information as [2] listed. So I
>>>>>>>> compared the log information between my mine and malariamine, and
>>>>>>>> found
>>>>>>>> my mine hasn't build several the indexes as [3] listed.
>>>>>>>>
>>>>>>>> Because I have used the same sources kegg-pathway as
>>>>>>>> malariamine, so
>>>>>>>> what the problem here?
>>>>>>>>
>>>>>>>> Any suggestions and comments are welcom! Thanks a lot!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Pengcheng Yang
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------
>>>>>>>> [1] build successful log information from my mine after load
>>>>>>>> kegg-pathway
>>>>>>>> /BUILD SUCCESSFUL//
>>>>>>>> //Total time: 21 seconds//
>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>> pool
>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>> //[Thread-8] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP pool
>>>>>>>> db.common-tgt-items is being shutdown.//
>>>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP
>>>>>>>> pool
>>>>>>>> db.production is being shutdown./
>>>>>>>>
>>>>>>>> [2] genespathways table from malariamine database.
>>>>>>>>
>>>>>>>>   pathways |  genes
>>>>>>>> ----------+---------
>>>>>>>>    2000002 | 1002796
>>>>>>>>    2000002 | 1003874
>>>>>>>>    2000002 | 1004075
>>>>>>>>
>>>>>>>> [3] the log information not appeared in my mine but in malariamine.
>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>> Gene__key_secondaryidentifier
>>>>>>>> ON Gene (secondaryIdentifier, organismid)
>>>>>>>>   [integrate] Creating index: CREATE INDEX Gene__key_symbol_org ON
>>>>>>>> Gene
>>>>>>>> (symbol, organismid)
>>>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>>>> Gene__key_primaryidentifier
>>>>>>>> ON Gene (primaryIdentifier)
>>>>>>>>   [integrate] Creating index: CREATE INDEX Organism__key_taxonid ON
>>>>>>>> Organism (taxonId)
>>>>>>>>   [integrate] Creating index: CREATE INDEX SOTerm__key ON SOTerm
>>>>>>>> (name,
>>>>>>>> ontologyid)
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev at intermine.org
>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>
>
>
>



More information about the dev mailing list