[InterMine Dev] Fwd: web app restarts

Joshua Heimbach jkh46 at cam.ac.uk
Tue May 12 13:40:26 BST 2015

Hi Joe,

The /service/sequence service was introduced in 1.3 so it exists in 
PhytoMine today. Here's an example of how to return the same data using 
both the old and the new end points:

(old) /service/regions/sequence:



     >chromosome_1:1..10 10bp C. reinhardtii


(supported) /regions/sequence:

    model="genomic" view="Chromosome.sequence.residues"><constraint
    path="Chromosome" op="LOOKUP" value="chromosome_1" extraValue="C.



The /regions/sequence end point doesn't support offsets or reverse 
complement sequences, but I can look into adding the functionality next 
week. Is it possible to transition users to the new endpoint to see how 
it impacts PhytoMine's memory issues?


On 08/05/15 18:48, Joe Carlson wrote:
> Hi Josh,
> This will keep us happy. It’s not in 1.4, right? (I forget what 
> version I had last merged with. 1.4.something)
> Is it too late for some enhancements? If start > end, then can you 
> return reverse complement sequence from end to start?
> Also, we often get people who want to see sequence upstream of the 
> CDS. Is there a way to provide a web service which is sequence from a 
> specific sequence with a prescribed upstream or downstream offset?
> Thanks,
> Joe
> On May 8, 2015, at 10:31 AM, Joshua Heimbach <jkh46 at cam.ac.uk 
> <mailto:jkh46 at cam.ac.uk>> wrote:
>> Hi Joe,
>> I see your point about /service/regions/fasta. When we discovered 
>> that /service/regions/sequence had been removed rather than 
>> deprecated I pointed the old uri to what I thought was a comparable 
>> solution but I didn't choose the right one. service/regions/fasta is 
>> for finding features in a given range and that's not what you're 
>> looking for.
>> Can you take a look at 
>> http://iodocs.labs.intermine.org/flymine/docs#/ws-sequence/GET/sequence 
>> ? Given a query, start, and end parameter, it returns sequence data.
>> For example, sequence between 10 and 10,000 on chromosome 4:
>> www.flymine.org/query/service/sequence?start=10&end=1000&query=<query 
>> model="genomic" view="Chromosome.sequence.residues"><constraint 
>> path="Chromosome" op="LOOKUP" value="4" extraValue="D. 
>> melanogaster"/></query> 
>> <x-msg://6/www.flymine.org/query/service/sequence?start=10&end=1000&query=%3Cquery%20model=%22genomic%22%20view=%22Chromosome.sequence.residues%22%3E%3Cconstraint%20path=%22Chromosome%22%20op=%22LOOKUP%22%20value=%224%22%20extraValue=%22D.%20melanogaster%22/%3E%3C/query%3E>
>> The result is a JSON object rather than fasta, but if it's close to 
>> what you need then we can debug from there as it's up-to-date in our 
>> codebase. There's still a chance that switching URI's could ease the 
>> memory issues, particularly since GenomicRegionSequenceExportServlet 
>> has been off our radar since 1.3.1.
>> Thanks,
>> Josh
>> On 08/05/15 18:01, Joe Carlson wrote:
>>> Hi Josh,
>>> Thanks for the quick response.
>>> What I’d like to get is the chromosome sequence between two 
>>> specified position. In earlier code, this was provided by the  URI
>>> /service/regions/sequence?query={“regions”:[“chromosome\tstart\tend”],”organism”:”short 
>>> name”}
>>> (we needed to tweak this a bit for our purposes, but this was the 
>>> old endpoint in your code.) And it returned chromosome sequence in a 
>>> fasta format.
>>> In 1.4, I noticed that you changed this so that it returned the 
>>> sequence of features - specified in the URI as an extra parameter 
>>> such as genes, exons, introns, … - contained within these 
>>> coordinates. This is the same as /service/regions/fasta. This isn’t 
>>> quite what we want. I had tried to specify ‘chromosome’ as the 
>>> feature type but that was rejected. I could not find another 
>>> suitable endpoint.
>>> There is a routine GenomicRegionFastaService. I don’t know if this 
>>> is currently enabled in any service call, or how whether is would 
>>> give me what I want.
>>> The old code works for us but the caching is causing us heap out of 
>>> memory errors. We’ve just recently determined that this was a cause 
>>> of our restarts and are about to turn off caching. But if you know 
>>> another way to get this information, let me know.
>>> Thanks,
>>> Joe
>>> On May 8, 2015, at 7:57 AM, Josh Heimbach <josh at intermine.org 
>>> <mailto:josh at intermine.org>> wrote:
>>>> Hi Joe,
>>>> While adding web service documentation in intermine 1.3.1, the 
>>>> endpoint /service/regions/sequence was retired for the reason that 
>>>> it was duplicating information found elsewhere. Much of the 
>>>> codebase has been refactored and improved since 1.3, so perhaps 
>>>> using a different servlet might solve the memory issue.
>>>> Could you send me an example request that you would make to 
>>>> /service/regions/sequence along with its parameters? I'll look for 
>>>> a suitable alternative web service that returns the same information.
>>>> Thanks,
>>>> Josh
>>>>> -------- Forwarded Message --------
>>>>> Subject: [InterMine Dev] web app restarts
>>>>> Date: Thu, 07 May 2015 17:28:06 -0700
>>>>> From: Joe Carlson <jwcarlson at lbl.gov <mailto:jwcarlson at lbl.gov>>
>>>>> To: dev at intermine.org <mailto:dev at intermine.org> 
>>>>> <dev at intermine.org <mailto:dev at intermine.org>>, David M. Goodstein 
>>>>> <dmgoodstein at lbl.gov <mailto:dmgoodstein at lbl.gov>>
>>>>> Hi Julie and gang
>>>>> We have just deployed our latest phytozome build based on 
>>>>> intermine 1.4.
>>>>> This is our first public release using Hikari.
>>>>> Our hopes were that going to hikari would solve some of the problem
>>>>> we've been seeing about tomcat restarts. We've traded emails about 
>>>>> this
>>>>> in the past where we see that we have to restart every couple of hours
>>>>> when under load. (We run a 'are you alive' cron job every 3 minuts and
>>>>> force a restart if we don't get a response.)
>>>>> At the time I think we had attributed it to the postgres 
>>>>> connection and
>>>>> we looking forward to the hikari pooling. It behaved well in internal
>>>>> use, but now that we're public we continue to see the restarts.
>>>>> I'm trying to do a little forensics to see what might be causing them.
>>>>> I'm seeing "OutOfMemoryError: Java heap space", typically after a call
>>>>> to retrieving the genomic sequence of a region
>>>>> (service/regions/sequence). I had noticed that you had removed this
>>>>> service in 1.4. I restored it since we're making use of it to deliver
>>>>> sequence to our main web portal. Did you remove this because you had
>>>>> seen it as being problematic?
>>>>> At this point, I'm not absolutely sure this is the source of the
>>>>> restarts but I'm very suspicious of
>>>>> org.intermine.bio.web.export.GenomicRegionSequenceExporter. There is a
>>>>> static map of entire chromosomes that is being stored. The 
>>>>> substring is
>>>>> retrieved by calling substring on elements of this map. This may work
>>>>> for smaller mines but we have enough sequence in our database that I
>>>>> suspect this is part of our problem.
>>>>> Was this web service removed deliberately? Is there something to 
>>>>> replace
>>>>> it? As I recall, the other sequence retrieval services I found only
>>>>> retrieved the sequence for specific features and not chromosome 
>>>>> slices.
>>>>> Thanks,
>>>>> Joe Carlson
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at intermine.org <mailto:dev at intermine.org>
>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at intermine.org <mailto:dev at intermine.org>
>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>> _______________________________________________
>>> dev mailing list
>>> dev at intermine.org
>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20150512/dbcc4ecc/attachment.html>

More information about the dev mailing list