[InterMine Dev] source_keys.properties file not read

Thomas TRIPLET thomastriplet at gmail.com
Thu Sep 8 15:30:31 BST 2011


Hi Julie,
Thanks for the information.

On Thu, Sep 8, 2011 at 4:56 AM, Julie Sullivan <julie at flymine.org> wrote:

> Is that the only source you are running?  The keys in that file are
> integration keys, they aren't used when loading a single source.  The build
> system uses the fields listed in that file to merge objects created in that
> source with objects already stored in the database, the file is not used
> when loading new objects.



I tried both with one or two sources (the second source is a map EC# to
protein IDs). The problem essentially remains the same altough the symptoms
are a bit different. With a single source, the keys don't seem to be parsed
as you just mentioned, resulting in duplicates. With 2 sources, the keys are
read, and I am getting an error when building, with the error message
complaining about the same object being stored twice.


On Thu, Sep 8, 2011 at 4:56 AM, Julie Sullivan <julie at flymine.org> wrote:

> If you need unique objects in a single source, you'll have to do that in
> the converter.


I believe I just understood what happened here. I actually have 2 files to
be processed, of different formats. In process(), I have a function for
each, and the function also stores the items. Since I defined the primary, I
thought the item would be updated with the content of the second. This is
not the case however, as the keys are *integration* keys, only used when 2
or more sources are defined. So parsing the 2 files generated duplicates,
and caused the error when integrating another source.

If I may suggest, I think it would be beneficial to enforce the primary
keys, even if one source only is used, as this is a little bit confusing.
And it probably happens quite often that a source is composed of several
files that can contain redundant information. Since *process()* is called
independently for each file (as far as I can tell), there is not way to
detect this in the converter (specially since the converter cannot access
items in the production db).

Best,
Thomas

--
Thomas Triplet, Ph.D.
http://www.thomastriplet.net





On Thu, Sep 8, 2011 at 4:56 AM, Julie Sullivan <julie at flymine.org> wrote:

> Hi Thomas
>
> Your keys and additions files look correct to me.
>
> Is that the only source you are running?  The keys in that file are
> integration keys, they aren't used when loading a single source.  The build
> system uses the fields listed in that file to merge objects created in that
> source with objects already stored in the database, the file is not used
> when loading new objects. If you need unique objects in a single source,
> you'll have to do that in the converter.  Here's some more information on
> the keys file:
>
>        http://intermine.org/wiki/**PrimaryKeys<http://intermine.org/wiki/PrimaryKeys>
>
> Also, try naming each key, eg. instead of DataSet.key use DataSet.key_name
> (which is the same name used by other sources).  This is the name of the
> index used by postgres, it's a good idea to have the same name for each
> across sources so you don't end up with duplicates.
>
> Let me know if that doesn't solve your problem.
>
> Cheers,
> Julie
>
>
> On 07/09/11 23:34, Thomas TRIPLET wrote:
>
>> Hello,
>> I've created a data source for the Enzyme Nomenclature, which works fine,
>> except that I am getting duplicates in the database (same EC number). I
>> did
>> define a key in the keys.properties file though, but it is as if the file
>> wasn't parsed at all.
>>
>> The additions.xml file contains the following:
>> <classes>
>> <class name="EnzymeClassification" is-interface="true">
>>  <attribute name="ecNumber" type="java.lang.String"/>
>>  <attribute name="acceptedName" type="java.lang.String"/>
>>  <attribute name="systematicName" type="java.lang.String"/>
>>  <attribute name="description" type="java.lang.String"/>
>>  <attribute name="isObsolete" type="boolean"/>
>>  <reference name="parentNode" referenced-type="**EnzymeClassification"/>
>>  <collection name="dataSets" referenced-type="DataSet" />
>>  </class>
>> </classes>
>>
>> In the keys.properties, I have
>>
>> DataSet.key = name
>> EnzymeClassification.key = ecNumber
>> EnzymeClassification.key_name = acceptedName
>>
>>
>>
>> Am I missing something or is there a way to force the keys definitions to
>> be
>> read?
>> Thanks
>> Thomas
>>
>> --
>> Thomas Triplet, Ph.D.
>> http://www.thomastriplet.net
>>
>>
>>
>>
>> ______________________________**_________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>
>
> ______________________________**_________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20110908/c66d5912/attachment-0001.html>


More information about the dev mailing list