[InterMine Dev] seeking advice, best practices
avallejos at mcw.edu
Tue Sep 13 18:41:14 BST 2011
Here is what I do in RatMine.
Extending the BioEntity. If I have a new core feature to add I first
consult the Sequence Ontology to see if there is a relevant term to use.
If there is, then I simply add the term to the Sequence Ontology file in
my mine. If there is not, or if I have some specific reason to have a
new model, then I extend the model that makes the most sense. For
example, RatMine has SNPs. But I wanted to differentiate between rsSNP
and ssSNPs. So I created two subclasses of SNP and names them
accordingly. The added benefit of this being someone can query for all
SNPs or just the sub class they want. I have not encountered any model
For other data source, I review the existing mines. FlyMine and ModMine
are the two I usually use. If I am doing something completely novel
then I just create a new top level model.
For source control I use Git. I have two Git repos, ratmine and
ratmine_bio_sources. RatMine is the main mine directory that lives in
the top level of the Intermine checkout. Ratmine_bio_sources resides in
the bio/ directory, next to sources. I place all RatMine specific
sources in ratmine_bio_sources. (Both repos are publicly available from
Where I have had to modify the core Intermine code, I make notes. And
then diligently remodify the InterMine code after every checkout.
I am not saying that my methods should be adopted as "best practices",
but that is what I do. I am also open to suggestions.
From: dev-bounces at intermine.org [mailto:dev-bounces at intermine.org] On
Behalf Of Joel Richardson
Sent: Tuesday, September 13, 2011 8:56 AM
To: dev at intermine.org
Subject: [InterMine Dev] seeking advice, best practices
The more I dig into Intermine, the more possibilities I see
but also the more questions I have. I'd really appreciate
any help/insight/opinions as to the best ways to deal with
the following issues.
Core model + bio extensions. Is there flexibility here? Can any
of this be changed without breaking things? Which parts?
- a fair amount of stuff is irrelevant for our data and so
will remain unpopulated in the mine. I know we can just ignore these
parts (and that's fine for now), but it seems a bit awkward, e.g,
to have "dead" classes available in the query builder.
- some aspects conflict with the data we have. For example,
neither Alleles nor Proteins are subclasses of BioEntity, and so
cannot have OntologyAnnotations. Again, I know we can define
our own subclasses of BioEntity (MGIAllele, MGIProtein, or whatever),
but that seems messy.
A larger question is whether/how the different mines (at least, the
InterMOD ones) coordinate their model extensions. I'm assuming everyone
pretty much extends the core model for their own purposes, and it's
a great strength of Intermine that this is possible. But it also
raises issues for interoperability as the mines' models diverge.
Source control/versioning. I'm wondering how people are approaching
version control of their mines' components (config files, source
code, etc.) as distinct from Intermine itself?
Loading lots of data from a relational db. Most of our data
will come out of MGI. There's lots of it and lots of different types.
Should this be one big load or lots of little ones? Should the
loads connect to the db directly, or should the db get dumped
in ItemXml format and we load that? If loading ItemsXml, is it
better to load one big file, or a directory of smaller ones?
Many thanks in advance,
Joel Richardson, Ph.D.
Sr. Research Scientist
Mouse Genome Informatics
The Jackson Laboratory Phone: (207) 288-6435
600 Main Street Fax: (207) 288-6132
Bar Harbor, Maine 04609 URL: www.informatics.jax.org
dev mailing list
dev at intermine.org
More information about the dev