[InterMine Dev] seeking advice, best practices

Joel Richardson jer at informatics.jax.org
Tue Sep 13 21:08:05 BST 2011

Thanks Andrew. That makes sense.

On 9/13/11 1:41 PM, Vallejos, Andrew wrote:
> Hi Joel,
> Here is what I do in RatMine.
> Extending the BioEntity.  If I have a new core feature to add I first
> consult the Sequence Ontology to see if there is a relevant term to use.
> If there is, then I simply add the term to the Sequence Ontology file in
> my mine.  If there is not, or if I have some specific reason to have a
> new model, then I extend the model that makes the most sense.  For
> example, RatMine has SNPs.  But I wanted to differentiate between rsSNP
> and ssSNPs.  So I created two subclasses of SNP and names them
> accordingly.  The added benefit of this being someone can query for all
> SNPs or just the sub class they want.  I have not encountered any model
> conflicts myself.
> For other data source, I review the existing mines.  FlyMine and ModMine
> are the two I usually use.  If I am doing something completely novel
> then I just create a new top level model.
> For source control I use Git.  I have two Git repos, ratmine and
> ratmine_bio_sources.  RatMine is the main mine directory that lives in
> the top level of the Intermine checkout.  Ratmine_bio_sources resides in
> the bio/ directory, next to sources.  I place all RatMine specific
> sources in ratmine_bio_sources. (Both repos are publicly available from
> GitHub.)
> Where I have had to modify the core Intermine code, I make notes.  And
> then diligently remodify the InterMine code after every checkout.
> I am not saying that my methods should be adopted as "best practices",
> but that is what I do.  I am also open to suggestions.
> -Andrew Vallejos
> -----Original Message-----
> From: dev-bounces at intermine.org [mailto:dev-bounces at intermine.org] On
> Behalf Of Joel Richardson
> Sent: Tuesday, September 13, 2011 8:56 AM
> To: dev at intermine.org
> Subject: [InterMine Dev] seeking advice, best practices
> Hi all,
> The more I dig into Intermine, the more possibilities I see
> but also the more questions I have. I'd really appreciate
> any help/insight/opinions as to the best ways to deal with
> the following issues.
> Core model + bio extensions. Is there flexibility here? Can any
> of this be changed without breaking things? Which parts?
> Reasons:
>     - a fair amount of stuff is irrelevant for our data and so
>     will remain unpopulated in the mine. I know we can just ignore these
>     parts (and that's fine for now), but it seems a bit awkward, e.g,
>     to have "dead" classes available in the query builder.
>     - some aspects conflict with the data we have. For example,
>     neither Alleles nor Proteins are subclasses of BioEntity, and so
>     cannot have OntologyAnnotations. Again, I know we can define
>     our own subclasses of BioEntity (MGIAllele, MGIProtein, or whatever),
>     but that seems messy.
> A larger question is whether/how the different mines (at least, the
> InterMOD ones) coordinate their model extensions. I'm assuming everyone
> pretty much extends the core model for their own purposes, and it's
> a great strength of Intermine that this is possible. But it also
> raises issues for interoperability as the mines' models diverge.
> Source control/versioning. I'm wondering how people are approaching
> version control of their mines' components (config files, source
> code, etc.) as distinct from Intermine itself?
> Loading lots of data from a relational db. Most of our data
> will come out of MGI. There's lots of it and lots of different types.
> Should this be one big load or lots of little ones? Should the
> loads connect to the db directly, or should the db get dumped
> in ItemXml format and we load that? If loading ItemsXml, is it
> better to load one big file, or a directory of smaller ones?
> Many thanks in advance,
> Joel


Joel Richardson, Ph.D.
Sr. Research Scientist
Mouse Genome Informatics
The Jackson Laboratory   Phone: (207) 288-6435
600 Main Street          Fax:   (207) 288-6132
Bar Harbor, Maine 04609  URL:   www.informatics.jax.org

More information about the dev mailing list