[InterMine Dev] read userprofile

Julie Sullivan julie at flymine.org
Thu Feb 2 14:00:42 GMT 2012


James

Can you send me your UniprotEntry.java file too?

All of the code on the branches and trunk load mouse proteins correctly.  I've 
looked at the diff and don't see any obvious reason why yours would fail to load 
the sequences, publications and organisms for some of your proteins.

But if you send me your UniprotEntry code I'll run the build locally and see if 
I can figure out what's going on.

Cheers
Julie


On 02/02/12 12:26, James Blackshaw wrote:
> Comparing your Converter to ours, we have:
>
> * Renamed some of the fields to fit better with our naming convention
> (PubMed_ID instead of pubMedId and so on)
> * Added links to KeggEC. (line 365)
> * Changed the way we handle fragments (line 273)
> * Stopped isoforms being added to entries, they were causing loading
> errors and we make a new record for each isoform. (line 307)
> * Disabled FlyBase resolver as it was causing duplicates to occur and
> we don't need it.
> * Disabled the isoform checker at line 538
> * changed the keywords section at line 612
>
> Unless I altered the current one a fair bit, it's just likely to introduce more
> errors. Anthony is away and I'd prefer to have checked some of his changes with
> him, but I know the reasoning for most of them. I've attached a comparison file
> of the differences between Uniprot converters.
>
> -James
>
> On 01/02/2012 16:48, julie at flymine.org wrote:
>> Using the same mouse file, I used the 0.97 UniProt converter and got
>> similar results. I have organism values for all genes and proteins.
>>
>> So what's different about your UniProt converter? You've made a few
>> requests over the years (fragments, etc) that I've incorporated into our
>> UniProt converter. Maybe you can abandon your customisations and just use
>> the one on the trunk?
>>
>> If there are most differences, I think it would be easiest if I added
>> those changes to the core core and you use that. You seem to run into
>> problems that I can't reproduce, if we're using the same code that won't
>> happen.
>>
>> Let me know what you think!
>>
>> ~~~
>>
>> flymine=# select * from protein where organismid is null;
>> secondaryidentifier | uniprotaccession | uniprotname | length | ecnumber
>> | primaryaccession | genbankidentifier | molecularweight | md5check
>> uenceid | class
>> ---------------------+------------------+-------------+--------+----------+------------------+-------------------+-----------------+---------
>>
>> --------+-------
>> (0 rows)
>>
>> flymine=# select count(*) from protein;
>> count
>> -------
>> 24002
>> (1 row)
>>
>> flymine=# select count(*) from gene;
>> count
>> -------
>> 16115
>> (1 row)
>>
>>> Sure. You can get them at:
>>> http://mitominer.mrc-mbu.cam.ac.uk/support/sites/default/files/downloads/uniprot_config.properties
>>>
>>> http://mitominer.mrc-mbu.cam.ac.uk/support/sites/default/files/downloads/10090_uniprot_sprot.xml
>>>
>>>
>>> Be warned, the swissprot file is 310 Mb in size or so
>>>
>>> Regards,
>>> James
>>>
>>> On 01/02/2012 12:02, julie at flymine.org wrote:
>>>> Hi James
>>>>
>>>> Can you send me the SwissProt file for mouse you are using, plus your
>>>> UniProt config file?
>>>>
>>>> Thanks!
>>>> Julie
>>>>
>>>>> We've fixed this by upping the RAM on the release server. We're
>>>>> intending to move to 0.98 for the next release of MitoMiner but we'd
>>>>> rather have a stable release right now. Our current issue's now that
>>>>> we've found a number of Uniprot entries don't fully load their data.
>>>>> We've got 1005 records where we have no length or sequence fields, and
>>>>> the links to Organism and Publication are missing. If we load records
>>>>> individually on a test server, they populate the database normally.
>>>>>
>>>>> -James
>>>>>
>>>>> Examples:
>>>>>
>>>>> A2A6R5
>>>>> A2A6T3
>>>>> A2A7A9
>>>>> A2A8C9
>>>>> A2AK36
>>>>> A2AK69
>>>>> A2AVQ8
>>>>> A2P2R3
>>>>> A6ND55
>>>>> A6NK59
>>>>> A8CG34
>>>>> A8MQA3
>>>>> A8MQN0
>>>>> A8MQT2
>>>>> B1AR25
>>>>> B2RC85
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 30/01/2012 15:18, Julie Sullivan wrote:
>>>>>> Hmn, that should work, it doesn't like something in your userprofile.
>>>>>>
>>>>>> Can you take a look in your webapp log files and tell me what error
>>>>>> messages you see? Maybe you can just send them to me? You want the
>>>>>> error messages with the timestamp matching when you tried to access
>>>>>> your webapp.
>>>>>>
>>>>>>
>>>>>> $TOMCAT/intermine.log
>>>>>> $TOMCAT/logs/catalina.out
>>>>>> $TOMCAT/logs/localhost-$DATE
>>>>>>
>>>>>>
>>>>>> I know you have custom code and it's time consuming, but I would start
>>>>>> thinking about upgrading to 0.98. You won't have to upgrade your
>>>>>> userprofile at all! Also, you don't get that "blank page" anymore.
>>>>>> Instead, the webapp gives you a nice helpful error message to let you
>>>>>> know what's gone wrong.
>>>>>>
>>>>>> On 30/01/12 15:05, J.A. Blackshaw wrote:
>>>>>>> Yes, I can. My first thought was to read it on that one, dump the
>>>>>>> userprofile
>>>>>>> there and export it to the release server. However, if I do that, I
>>>>>>> get a blank
>>>>>>> website when I release the webapp. This doesn't happen if I use a
>>>>>>> userprofile
>>>>>>> database I create from scratch.
>>>>>>>
>>>>>>> -James
>>>>>>>
>>>>>>> On Jan 30 2012, Julie Sullivan wrote:
>>>>>>>
>>>>>>>> Can you write/read the XML on the dev server?
>>>>>>>>
>>>>>>>> On 30/01/12 12:45, jab250 wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The MitoMiner Userprofile database currently stands at around
>>>>>>>>> 112MB. I've
>>>>>>>>> been trying to port it according to the instructions on the
>>>>>>>>> Intermine
>>>>>>>>> website, but it's needing around 8 GB of memory to do so, which
>>>>>>>>> means
>>>>>>>>> upgrading the release server. Is it normal to require so much
>>>>>>>>> memory? I've
>>>>>>>>> tried copying the database
>>>>>>>> No, that's not really normal.
>>>>>>>>
>>>>>>>>> directly from the development server, but even when the data are
>>>>>>>>> the same in
>>>>>>>>> both records databases, the website fails to release. If I copy the
>>>>>>>>> userprofile
>>>>>>>> Do you still have the error messages from when this failed?
>>>>>>>>
>>>>>>>>> database over as a gzip file and unpack it, does everything
>>>>>>>>> including the
>>>>>>>>> name of each database have to be identical? The data are, but the
>>>>>>>>> database
>>>>>>>>> names are different.
>>>>>>>> The database names can be different, the webapp uses the properties
>>>>>>>> file
>>>>>>>> (.intermine/mitomine.properties) to get the database names.
>>>>>>>>
>>>>>>>> What's important is the InterMine IDs of the objects are the same.
>>>>>>>> eg.
>>>>>>>> `gene.id`, the ID you see in the URL. This ID (eg. gene.id) is
>>>>>>>> what's used in
>>>>>>>> the userprofile database. This ID changes with each build of the
>>>>>>>> database and
>>>>>>>> that's what the write/read userprofile XML process updates.
>>>>>>>>
>>>>>>>> So you should be able to do this:
>>>>>>>>
>>>>>>>> 1. write/read userprofile database XML on dev server
>>>>>>>> 2. dump userprofile database, copy dump over to release server,
>>>>>>>> restore to
>>>>>>>> userprofile db
>>>>>>>> 3. release new webapp on release server
>>>>>>>> - new production database
>>>>>>>> - new userprofile
>>>>>>>> - make sure mitomine.properties has correct database names
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> dev mailing list
>>>>>>>>> dev at intermine.org
>>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev at intermine.org
>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>
>>>
>>>
>
>



More information about the dev mailing list