[InterMine Dev] read userprofile

James Blackshaw jab250 at mrc-mbu.cam.ac.uk
Thu Feb 2 12:26:23 GMT 2012


Comparing your Converter to ours, we have:

  * Renamed some of the fields to fit better with our naming convention
    (PubMed_ID instead of pubMedId and so on)
  * Added links to KeggEC. (line 365)
  * Changed the way we handle fragments (line 273)
  * Stopped isoforms being added to entries, they were causing loading
    errors and we make a new record for each isoform. (line 307)
  * Disabled FlyBase resolver as it was causing duplicates to occur and
    we don't need it.
  * Disabled the isoform checker at line 538
  * changed the keywords section at line 612

Unless I altered the current one a fair bit, it's just likely to 
introduce more errors. Anthony is away and I'd prefer to have checked 
some of his changes with him, but I know the reasoning for most of them. 
I've attached a comparison file of the differences between Uniprot 
converters.

-James

On 01/02/2012 16:48, julie at flymine.org wrote:
> Using the same mouse file, I used the 0.97 UniProt converter and got
> similar results.  I have organism values for all genes and proteins.
>
> So what's different about your UniProt converter?  You've made a few
> requests over the years (fragments, etc) that I've incorporated into our
> UniProt converter.  Maybe you can abandon your customisations and just use
> the one on the trunk?
>
> If there are most differences, I think it would be easiest if I added
> those changes to the core core and you use that.  You seem to run into
> problems that I can't reproduce, if we're using the same code that won't
> happen.
>
> Let me know what you think!
>
> ~~~
>
> flymine=# select * from protein where organismid is null;
>   secondaryidentifier | uniprotaccession | uniprotname | length | ecnumber
> | primaryaccession | genbankidentifier | molecularweight | md5check
> uenceid | class
> ---------------------+------------------+-------------+--------+----------+------------------+-------------------+-----------------+---------
> --------+-------
> (0 rows)
>
> flymine=# select count(*) from protein;
>   count
> -------
>   24002
> (1 row)
>
> flymine=# select count(*) from gene;
>   count
> -------
>   16115
> (1 row)
>
>> Sure. You can get them at:
>> http://mitominer.mrc-mbu.cam.ac.uk/support/sites/default/files/downloads/uniprot_config.properties
>> http://mitominer.mrc-mbu.cam.ac.uk/support/sites/default/files/downloads/10090_uniprot_sprot.xml
>>
>> Be warned, the swissprot file is 310 Mb in size or so
>>
>> Regards,
>> James
>>
>> On 01/02/2012 12:02, julie at flymine.org wrote:
>>> Hi James
>>>
>>> Can you send me the SwissProt file for mouse you are using, plus your
>>> UniProt config file?
>>>
>>> Thanks!
>>> Julie
>>>
>>>> We've fixed this by upping the RAM on the release server. We're
>>>> intending to move to 0.98 for the next release of MitoMiner but we'd
>>>> rather have a stable release right now. Our current issue's now that
>>>> we've found a number of Uniprot entries don't fully load their data.
>>>> We've got 1005 records where we have no length or sequence fields, and
>>>> the links to Organism and Publication are missing. If we load records
>>>> individually on a test server, they populate the database normally.
>>>>
>>>> -James
>>>>
>>>> Examples:
>>>>
>>>> A2A6R5
>>>> A2A6T3
>>>> A2A7A9
>>>> A2A8C9
>>>> A2AK36
>>>> A2AK69
>>>> A2AVQ8
>>>> A2P2R3
>>>> A6ND55
>>>> A6NK59
>>>> A8CG34
>>>> A8MQA3
>>>> A8MQN0
>>>> A8MQT2
>>>> B1AR25
>>>> B2RC85
>>>>
>>>>
>>>>
>>>>
>>>> On 30/01/2012 15:18, Julie Sullivan wrote:
>>>>> Hmn, that should work, it doesn't like something in your userprofile.
>>>>>
>>>>> Can you take a look in your webapp log files and tell me what error
>>>>> messages you see?  Maybe you can just send them to me?  You want the
>>>>> error messages with the timestamp matching when you tried to access
>>>>> your webapp.
>>>>>
>>>>>
>>>>>       $TOMCAT/intermine.log
>>>>>       $TOMCAT/logs/catalina.out
>>>>>       $TOMCAT/logs/localhost-$DATE
>>>>>
>>>>>
>>>>> I know you have custom code and it's time consuming, but I would start
>>>>> thinking about upgrading to 0.98.  You won't have to upgrade your
>>>>> userprofile at all! Also, you don't get that "blank page" anymore.
>>>>> Instead, the webapp gives you a nice helpful error message to let you
>>>>> know what's gone wrong.
>>>>>
>>>>> On 30/01/12 15:05, J.A. Blackshaw wrote:
>>>>>> Yes, I can. My first thought was to read it on that one, dump the
>>>>>> userprofile
>>>>>> there and export it to the release server. However, if I do that, I
>>>>>> get a blank
>>>>>> website when I release the webapp. This doesn't happen if I use a
>>>>>> userprofile
>>>>>> database I create from scratch.
>>>>>>
>>>>>> -James
>>>>>>
>>>>>> On Jan 30 2012, Julie Sullivan wrote:
>>>>>>
>>>>>>> Can you write/read the XML on the dev server?
>>>>>>>
>>>>>>> On 30/01/12 12:45, jab250 wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The MitoMiner Userprofile database currently stands at around
>>>>>>>> 112MB. I've
>>>>>>>> been trying to port it according to the instructions on the
>>>>>>>> Intermine
>>>>>>>> website, but it's needing around 8 GB of memory to do so, which
>>>>>>>> means
>>>>>>>> upgrading the release server. Is it normal to require so much
>>>>>>>> memory? I've
>>>>>>>> tried copying the database
>>>>>>> No, that's not really normal.
>>>>>>>
>>>>>>>> directly from the development server, but even when the data are
>>>>>>>> the same in
>>>>>>>> both records databases, the website fails to release. If I copy the
>>>>>>>> userprofile
>>>>>>> Do you still have the error messages from when this failed?
>>>>>>>
>>>>>>>> database over as a gzip file and unpack it, does everything
>>>>>>>> including the
>>>>>>>> name of each database have to be identical? The data are, but the
>>>>>>>> database
>>>>>>>> names are different.
>>>>>>> The database names can be different, the webapp uses the properties
>>>>>>> file
>>>>>>> (.intermine/mitomine.properties) to get the database names.
>>>>>>>
>>>>>>> What's important is the InterMine IDs of the objects are the same.
>>>>>>> eg.
>>>>>>> `gene.id`, the ID you see in the URL. This ID (eg. gene.id) is
>>>>>>> what's used in
>>>>>>> the userprofile database. This ID changes with each build of the
>>>>>>> database and
>>>>>>> that's what the write/read userprofile XML process updates.
>>>>>>>
>>>>>>> So you should be able to do this:
>>>>>>>
>>>>>>> 1. write/read userprofile database XML on dev server
>>>>>>> 2. dump userprofile database, copy dump over to release server,
>>>>>>> restore to
>>>>>>> userprofile db
>>>>>>> 3. release new webapp on release server
>>>>>>> - new production database
>>>>>>> - new userprofile
>>>>>>> - make sure mitomine.properties has correct database names
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> James
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev at intermine.org
>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> dev at intermine.org
>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>
>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20120202/54961130/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: UniprotConverter.java
URL: <http://mail.intermine.org/pipermail/dev/attachments/20120202/54961130/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Diff-2.diff
URL: <http://mail.intermine.org/pipermail/dev/attachments/20120202/54961130/attachment-0003.ksh>


More information about the dev mailing list