[InterMine Dev] Prevent post-processor from running multiple times when datasource has multiple source defs in project.xml?

Sam Hokin shokin at ncgr.org
Mon Apr 18 15:58:19 BST 2016

That's certainly an option, Sergio, but this post-processor specifically uses the data model additions that are defined in my
datasource (bio/sources/legfed/legfed_additions.xml). It would be of no use to the general public as a standalone processor. It's a
perfect example of a datasource-specific post-processor that should be run by do-sources.

So, I can get by, but my issue is at the app design level: I think running a post-processor for every time you have a data source
integration entry in project.xml is bad design - it goes against the idea of being able to use a data source with multiple
integration processors (I have a total of eight so far), since one would never run a post-processor more than once. To me, it makes
sense to define the post-processor under the legfed datasource and for it to be run once from do-sources.

Like I said, it seems to me that FlyMine's chado-db post-processor, FlyBasePostProcess.java, must get run up to 6 times since that
datasource has 6 processors (FlyBaseProcessor.java, ModEncodeFeatureProcessor.java, ModEncodeMetaDataProcessor.java,
SequenceProcessor.java, StockProcessor.java, WormBaseProcessor.java) which could be run separately. So, I'd have thought that you
folks would have already run into this.

Sorry about the verbose justification, but yes, I think this issue deserves a ticket. :)


On 04/18/2016 08:25 AM, sergio contrino wrote:
> dear sam,
> would it be reasonable to add a specific post process for your chado data, and add it to the list of post processes at the end of
> your project file (while removing the post proceeses involved in the do-source one)?
> depending on what your post process do you could refer to different cases already in the repository.
> if this is not reasonable, i'll make a ticket.
> thanks!
> sergio

More information about the dev mailing list