[InterMine Dev] kegg-pathway load: genespathways table is empty

Julie Sullivan julie at flymine.org
Mon Aug 3 10:13:45 BST 2015


Also, probably a typo but the file you want to edit is 
"kegg_config.properties" as listed on the docs:

https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties

The file you listed below (kegg-pathway_keys.properties) is the keys 
file which governs the integration keys used in the build.

On 03/08/15 10:10, Julie Sullivan wrote:
> Can you send me the configuration you added to the file?
>
> This is the list of KEGG organisms and associated abbreviations:
>
>      http://www.genome.jp/kegg/catalog/org_list.html
>
>
> On 03/08/15 10:06, Pengcheng Yang wrote:
>>
>> Hi Chen Yian and Julie Sullivan,
>>
>> Thank you for your reply and the information.
>>
>> I have tried both the following two methods, the talbe "genespathways"
>> remains empty.
>> 1) Adding the org.taxonId=1111 to the
>> bio/sources/kegg-pathway/resources/kegg-pathway_keys.properties file.
>> Here the taxonId and organism name were coined for confidential reason.
>> 2) remove the "kegg.organisms" property from project.xml file.
>>
>> I have checked the related information maybe useful:
>> 1) "select  * from pathway" return expected information.
>> 2) I compared the kegg-pathway.log1 file from mymine and malariamine and
>> found the following that specific to mymine, not existed in
>> malariamine's kegg-pathway.log1
>>
>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java added as
>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>> outdated.
>> 3172,3173d3169
>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>> org/intermine/model/bio/ProteinRegion.class is outdated.
>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java added as
>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>> 3217c3213
>> <     [javac]
>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegion.java
>>
>>
>> <     [javac]
>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinDomainRegionShadow.java
>>
>>
>> 3352,3353d3345
>> <     [javac]
>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegion.java
>>
>>
>> <     [javac]
>> /home/pengchy/Soft/05.SystemBiology/intermine/mymine/dbmodel/build/gen/src/org/intermine/model/bio/ProteinRegionShadow.java
>>
>>
>> 3540,3541d3531
>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegion.class added as
>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>> <   [lib:jar] org/intermine/model/bio/ProteinDomainRegionShadow.class
>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>> outdated.
>> 3543,3544d3532
>> <   [lib:jar] org/intermine/model/bio/ProteinRegion.class added as
>> org/intermine/model/bio/ProteinRegion.class is outdated.
>> <   [lib:jar] org/intermine/model/bio/ProteinRegionShadow.class added as
>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>> <   [lib:jar] adding entry
>> org/intermine/model/bio/ProteinDomainRegion.class
>> <   [lib:jar] adding entry
>> org/intermine/model/bio/ProteinDomainRegionShadow.class
>> 3720,3721d3705
>> <   [lib:jar] adding entry org/intermine/model/bio/ProteinRegion.class
>> <   [lib:jar] adding entry
>> org/intermine/model/bio/ProteinRegionShadow.class
>> <     [javac] org/intermine/model/bio/ProteinDomainRegion.java added as
>> org/intermine/model/bio/ProteinDomainRegion.class is outdated.
>> <     [javac] org/intermine/model/bio/ProteinDomainRegionShadow.java
>> added as org/intermine/model/bio/ProteinDomainRegionShadow.class is
>> outdated.
>> 7230,7231d7211
>> <     [javac] org/intermine/model/bio/ProteinRegion.java added as
>> org/intermine/model/bio/ProteinRegion.class is outdated.
>> <     [javac] org/intermine/model/bio/ProteinRegionShadow.java added as
>> org/intermine/model/bio/ProteinRegionShadow.class is outdated.
>> 7275c7255
>>
>> Thanks a lot!
>>
>> Best,
>> Pengcheng Yang
>>
>>
>>
>> On 2015/8/3 16:14, Julie Sullivan wrote:
>>> Here are the docs on the kegg source:
>>>
>>> http://intermine.readthedocs.org/en/latest/database/data-sources/library/pathways/kegg/
>>>
>>>
>>>
>>> KEGG uses its own prefix, which InterMine does not know. You have to
>>> configure this in the config file.
>>>
>>> e.g. KEGG uses "dme" for Drosophila melanogaster and the data file is
>>> named "dme_gene_map.tab".
>>>
>>> The reason why malaria worked is that is already configured:
>>>
>>> https://github.com/intermine/intermine/blob/master/bio/sources/kegg-pathway/main/resources/kegg_config.properties#L37
>>>
>>>
>>>
>>> You have two options:
>>>
>>> 1. remove the taxon ID from your project XML file, all genes will be
>>> loaded
>>>
>>> 2. configure the taxon ID in the kegg_config.properties
>>>
>>>
>>>
>>> On 03/08/15 08:55, Pengcheng Yang wrote:
>>>> Hi Julie Sullivan,
>>>>
>>>> Thank you for your reply.
>>>>
>>>> I listed the kegg-pathway part of the project.xml file for the two
>>>> mine.
>>>> It seems they have no difference except the path and organisms.
>>>>
>>>> [1] The project.xml of my mine:
>>>> ----------------------
>>>> <source name="kegg-pathway" type="kegg-pathway">
>>>>         <property name="kegg.organisms" value="1111"/>
>>>>        <property name="src.data.dir" location="/path/to/mymine/kegg/"/>
>>>>      </source>
>>>>
>>>> [2] The project.xml of malariamine
>>>>      <source name="kegg-pathway" type="kegg-pathway">
>>>>        <property name="kegg.organisms" value="36329"/>
>>>>        <property name="src.data.dir"
>>>> location="/path/to/malaria/kegg/"/>
>>>>      </source>
>>>>
>>>> I have checked the file org_gene_map.tab file, its format indeed is:
>>>> GeneID<tb>mapid<space>mapid<space>mapid
>>>>
>>>> Best,
>>>> Pengcheng Yang
>>>>
>>>> On 2015/8/3 15:32, Julie Sullivan wrote:
>>>>> Sorry you are having problems with the kegg source!
>>>>>
>>>>> Can you clarify what is different about the two project XML files?
>>>>>
>>>>> On 02/08/15 10:01, Pengcheng Yang wrote:
>>>>>> Hi InterMiner developers,
>>>>>>
>>>>>> Thank you all who answered my questions. Here is another question
>>>>>> that
>>>>>> blocked my way to deploy my InterMine.
>>>>>>
>>>>>> To load kegg-pathway data, I set the project.xml as that in
>>>>>> malariamine
>>>>>> and prepared the two files map_title.tab and org_gene_map.tab. When I
>>>>>> load the data using "ant -Dsource=kegg-pathway -v 1>
>>>>>> kegg-pathway.log1
>>>>>> 2> kegg-pathway.log2", the kegg-pathway.log1 said at the end [1].
>>>>>> However, when I query in the postgres database using SQL language:
>>>>>> "select * from genespathways", nothing returned.
>>>>>>
>>>>>> But when I do the same thing for malariamine after loading
>>>>>> kegg-pathway
>>>>>> data, I got the pathways to genes information as [2] listed. So I
>>>>>> compared the log information between my mine and malariamine, and
>>>>>> found
>>>>>> my mine hasn't build several the indexes as [3] listed.
>>>>>>
>>>>>> Because I have used the same sources kegg-pathway as malariamine, so
>>>>>> what the problem here?
>>>>>>
>>>>>> Any suggestions and comments are welcom! Thanks a lot!
>>>>>>
>>>>>> Best,
>>>>>> Pengcheng Yang
>>>>>>
>>>>>>
>>>>>> ---------------------------------
>>>>>> [1] build successful log information from my mine after load
>>>>>> kegg-pathway
>>>>>> /BUILD SUCCESSFUL//
>>>>>> //Total time: 21 seconds//
>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP pool
>>>>>> db.common-tgt-items is being shutdown.//
>>>>>> //[Thread-8] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP pool
>>>>>> db.common-tgt-items is being shutdown.//
>>>>>> //[Thread-16] INFO com.zaxxer.hikari.pool.HikariPool - HikariCP pool
>>>>>> db.production is being shutdown./
>>>>>>
>>>>>> [2] genespathways table from malariamine database.
>>>>>>
>>>>>>   pathways |  genes
>>>>>> ----------+---------
>>>>>>    2000002 | 1002796
>>>>>>    2000002 | 1003874
>>>>>>    2000002 | 1004075
>>>>>>
>>>>>> [3] the log information not appeared in my mine but in malariamine.
>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>> Gene__key_secondaryidentifier
>>>>>> ON Gene (secondaryIdentifier, organismid)
>>>>>>   [integrate] Creating index: CREATE INDEX Gene__key_symbol_org ON
>>>>>> Gene
>>>>>> (symbol, organismid)
>>>>>>   [integrate] Creating index: CREATE INDEX
>>>>>> Gene__key_primaryidentifier
>>>>>> ON Gene (primaryIdentifier)
>>>>>>   [integrate] Creating index: CREATE INDEX Organism__key_taxonid ON
>>>>>> Organism (taxonId)
>>>>>>   [integrate] Creating index: CREATE INDEX SOTerm__key ON SOTerm
>>>>>> (name,
>>>>>> ontologyid)
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev at intermine.org
>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>



More information about the dev mailing list