[InterMine Dev] Any way to pass an input file to a post-process task?

Joe Carlson jwcarlson at lbl.gov
Thu Jul 21 00:09:07 BST 2016


Just my 2 cents of advice:

You probably should put the InterPro accession information into your chado database by introducing another feature_dbxref for the domain. You only need to load it once rather than every time you build a mine, and it’s there forever in your reference database. In my mine, loading is a very long process and anything that allows me to do work in advance is a good thing.

Joe

> On Jul 20, 2016, at 3:45 PM, Sam Hokin <shokin at ncgr.org> wrote:
> 
> Yeah, I could stuff it into a source-specific post-process task for do-sources, but I have many processors for this given source and do-sources ends up running the post-processor for every one (previous post to this list).
> 
> Also, it's actually a pretty generic post-processor: it just associates Interpro data with protein domains based on their ID (PF*, SM*, TIGR*, PIRSF*, GENE3D*), which is what I happen to get from my chado data source. In addition to associating the Interpro accession, I use this to associate domain names (e.g. "TIR") with the protein domains, which I didn't have otherwise and is something biologists would want to search on.
> 
> And yes, it is shocking that I've had to hack PostProcessOperationsTask rather than edit a mine-specific XML file. I've got all sorts of stuff floating under bio/postprocess now that I'd prefer to have in an external place like bio/sources/*. Basically, I think the architecture of sources should have been repeated for post-processors.
> 
> It seems to me that a post-processor could very well want an input file specified like I do. I expect it's just an oversight because one hasn't been needed yet. I can hack the code, but I thought I'd find out what's up first.
> 
> On 07/20/2016 03:55 PM, justincc at intermine.org wrote:
>> Hi Sam,
>> 
>> Are you trying to add a post processing operation to a <source> or a new <post-process> step in <post-processing>?
>> 
>> PostProcessOperationsTask handles <post-process> steps (and hardcodes all possible steps such as create-references which is really
>> quite shocking).
>> 
>> However, do you really want post-processing on an individual source <source> (confusing, I know).  This is done by providing a class
>> in that source's directory that extends org.intermine.postprocess.PostProcessor.  This should accept the properties in <source> as
>> usual (as described at [1]).  One existing example is BioPAXPostProcess.java
>> 
>> [1] https://intermine.readthedocs.io/en/latest/database/data-sources/custom/
> _______________________________________________
> dev mailing list
> dev at lists.intermine.org
> https://lists.intermine.org/mailman/listinfo/dev



More information about the dev mailing list