[InterMine Dev] Querying GO terms

Julie Sullivan julie at flymine.org
Tue Sep 13 15:10:33 BST 2011


Hi Brian

That's definitely wrong, each protein should at least have a primary accession 
and PPARG should have a gene.

Did you split up the UniProt file into swiss prot and trembl?

	http://intermine.org/wiki/UniProt

Cheers
Julie

On 09/09/11 15:50, Lewis, Brian Andrew wrote:
> Julie -
>
>
>
> I just did a query using the template Proteins ->  GO Terms for "PPARG_HUMAN" and got no results back.  I am getting the annotation files from here: http://www.geneontology.org/GO.downloads.annotations.shtml  One weird thing is that my protein table in the nntcmine database looks like this:
>
>
>
> secondaryidentifier | uniprotaccession | uniprotname | length | ecnumber | primaryaccession | molecularweight |           md5checksum            |
>
> symbol | primaryidentifier |    id    | isfragment | name | isuniprotcanonical | organismid | sequenceid |              class
>
> ---------------------+------------------+-------------+--------+----------+------------------+-----------------+----------------------------------+
>
> --------+-------------------+----------+------------+------+--------------------+------------+------------+---------------------------------
>
>                       |                  |             |        |          |                  |                 | c0c909c865c2ff61047fb7625dd34842 |
>
>          |                   | 77292975 |            |      |                    |   72000003 |   77292976 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 980432b31c6aca424fb5ed0db2978eec |
>
>          |                   | 77293049 |            |      |                    |   72000003 |   77293050 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 2efcc2ea8a76a3e4fa140be1f0448d30 |
>
>          |                   | 77293087 |            |      |                    |   72000003 |   77293088 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | fd33b774741fbba9017a0aa6020789e9 |
>
>          |                   | 77293138 |            |      |                    |   72000003 |   77293139 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 76fe237e1debd515a7a0302278ed8cf7 |
>
>          |                   | 77293172 |            |      |                    |   72000003 |   77293173 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 431a7a0baac52b72e91d1848412baef8 |
>
>          |                   | 77293204 |            |      |                    |   72000003 |   77293205 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 3d86e0541c207b07c273071e4033b10c |
>
>          |                   | 77293326 |            |      |                    |   72000003 |   77293327 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 7870027cf865b39c673d6e920231f788 |
>
>          |                   | 77293396 |            |      |                    |   72000003 |   77293397 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | a9b09808cb1e8930421599177a66751b |
>
>          |                   | 77293462 |            |      |                    |   72000003 |   77293463 | org.intermine.model.bio.Protein
>
>                       |                  |             |        |          |                  |                 | 40a7e6e661b3fc7a863258bd21a64b4b |
>
>
>
> It kinda looks like some columns are missing to me...so I think that might be the problem.  here is my Uniprot source entry in project.xml:
>
>
>
>    <source name="uniprot" type="uniprot">
>
>                  <property name="uniprot.organisms" value="9606"/>
>
>                  <property name="createinterpro" value="true"/>
>
>                  <property name="src.data.dir" location="/interMineData/uniprot/current/"/>
>
>                  <property name="creatego" value="true"/>
>
>   </source>
>
>
>
> (The Uniprot data source takes about 14 seconds tops to load and gives no errors)
>
>
>
> Thanks,
>
> ~ Brian
>
>
>
>
>
> Date: Thu, 8 Sep 2011 17:17:47 +0100
>
> From: julie at flymine.org<mailto:julie at flymine.org>
>
> To: dev at intermine.org<mailto:dev at intermine.org>
>
> Subject: Re: [InterMine Dev] Querying GO terms
>
> Message-ID:
>
>                  <0b85c558b7ed563f62079db15278c890.squirrel at webmail.flymine.org<mailto:0b85c558b7ed563f62079db15278c890.squirrel at webmail.flymine.org>>
>
> Content-Type: text/plain;charset=iso-8859-1
>
>
>
> Where are you getting the annotation files from?  I ask because we have human GO data configured for UniProt which assigns the data to proteins not genes:
>
>
>
> http://intermine.org/browser/branches/intermine_0_97/bio/sources/go-annotation/main/resources/go-annotation_config.properties
>
>
>
> Although if you have UniProt loaded there is a postprocess that should copy over the GO objects to the related genes.
>
>
>
> Can you check to see if you have proteins with GO terms?  That would narrow down the problem a bit for me.
>
>



More information about the dev mailing list