[InterMine Dev] many-to-one storage
julie at flymine.org
Mon Aug 24 10:48:32 BST 2009
Hello! I've taken a look at your converter and made some changes. I tried to
add a lot of comments, so I hope it's clear what I've done.
1. You can specify a specific file to convert in your project.xml file. Take a
look at FlyMine's project.xml file for an example:
<source name="tiffin-expression" type="tiffin-expression">
<property name="src.data.dir" location="/shared/data/tiffin/"/>
<property name="src.data.dir.includes" value="tiffin_ImaGo"/>
The tiffin-expression source is only going to process the "tiffin_ImaGo" file in
the /shared/data/tiffin directory. Add this to your expression entry in your
<property name="src.data.dir.includes" value="xpat.txt"/>
Then your converter will only ever process "xpat.txt", so you won't need an
extra process method (I've removed it from your converter).
You can also use wildcards to filter the files, if you wanted to process all TXT
files in the specified directory for example:
<property name="src.data.dir.includes" value="*.txt"/>
2. Your additions file looks correct to me.
3. Regarding one to many relationships, you have to store both objects. You
have to save both the gene and the expression object to the database.
But you only have to save one *relationship* to the database. You don't have to
store both. Does that make sense? So you only need to set expression --> gene,
you don't need gene --> expressions.
You had both of these, so I removed one:
* this automatically creates the gene --> expressions relationship
* you don't need this because this relationship is already generated
using the statement above. I've removed this from your converter.
4. Storing items.
You generally want to store items to the database as soon as you can, for memory
reasons. The rule for this is when you are done updating the object, store it.
However, once you store the object to the database, you can't change it or do
I didn't see in your converter where you were storing the gene, but I've added a
line to store the object as soon as you know the identifier.
I also moved the storing of the expression. I hope that's okay.
PS. I've added a couple of comments below.
Sierra Moxon wrote:
> Hi again-
> I asked during the last conference call about storing genes&expressions.
> A gene has many "expressions" and an "expression" has one gene. You
> mentioned, I think, (and I did find some doc), that says I need to store
> the "expression" object, but not the "gene" object in this kind of
You have to store both objects.
> So, more details, I have a file that gives me two ids (I'm simplifying a
> bit to make the questions easier to answer, I hope):
> gene_id xpression_id
> So, I map the expression and gene like this:
> <class name="Expression" is-interface="true">
> <attribute name="primaryIdentifier" type="java.lang.String"/>
> <reference name="gene" referenced-type="Gene"
> <class name="Gene" extends="BioEntity" is-interface="true">
> <collection name="expressions" referenced-type="Expression"
That looks correct to me.
> My code sets the gene reference on the expression Item I have created,
> then tries to save the expression. No genes objects exist in my db
> before this load. It does not try to save the gene object (assuming
> this gets created and stored by the addition of the expression object?).
No, sorry, I don't think this was clear. The gene needs to be stored. What it
automatically creates is the reverse relationship.
> I get the error after items are loaded into the tgt db:
> java.lang.RuntimeException: java.lang.RuntimeException: failed to find
> referenced item in object store: 5_1
It's looking for the gene. You can look in your common-tgt-items database for
object 5_1 if you weren't sure what was missing.
> If I try saving the gene and the expression, I get duplicate errors from
I think it's because the store() method was being called in the wrong place.
store(expression) was being called outside of the IF statement checking for a
valid gene, for example. I've moved it.
> If I take off the reverse-references and store both the gene and the
> expression objects, then the load succeeds.
The build didn't fail, but the data was incorrect.
> Other details:
> * sometimes the Gene is null (the full constraint is that gene, probe,
> and antibody are all nullable, but if gene is null, then antibody is not
If that is true, then you need to check that the gene is not null before setting
the reference. Otherwise it'll throw an error. Attributes and references can't
> * I do this kind of mapping for an expression's EST, and Antibody
> Do I have to have the gene loaded (with PK set, etc...) before I load
> the expression? Perhaps from an identifier load?
You only need to worry about primary keys if you are merging these data with
other data from another source.
> Can I have null references when I go to store an object like the
> Expression object?
No. You can't reference an object that doesn't exist.
You *must* store that gene at some time during the build.
> I attached my loader and my mapping, sorry if turns out to be a silly
> mistake...I've been staring at this waaay to long and I'm tempted to
> just turn it back the way it was with no reverse-references, but I think
> the model would be incorrect that way.
I've moved a couple of things around. I tried to add a lot of comments, but
please ask me if you have questions!
> Thanks for your help,
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 10993 bytes
Desc: not available
More information about the dev