[InterMine Dev] how to define LocatedOn attribute of Location and avoid duplicate objects error

Dr. Intikhab Alam intikhab.alam at kaust.edu.sa
Mon Mar 5 16:51:00 GMT 2012


Thanks for your quick response Richard,



On 3/5/12 4:41 PM, "Richard Smith" <richard at flymine.org> wrote:

>On 05/03/2012 16:30, Dr. Intikhab Alam wrote:
>> Dear Richard,
>> 
>> Thank you for your email.
>> 
>> 
>> 
>> On 3/5/12 3:39 PM, "Richard Smith"<richard at flymine.org>  wrote:
>> 
>>> On 05/03/2012 15:23, Dr. Intikhab Alam wrote:
>>>> Dear Richard,
>>>>
>>>> Thanks for your email and possible solution.
>>>>
>>>> What is the primaryKey on which organism data is merged? Is it taxonId
>>>> right?
>>>
>>> Yes.
>>>
>>>> My xml sources have the Organism defined only once, there are no
>>>> duplicate
>>>> occurrences of taxonIds, though I load multiple xml files. My xml
>>>>files
>>>> contained Information other than taxonIds as well, which perhaps
>>>> integrate
>>>> doesn¹t like as this information is already in the database when
>>>>flymine
>>>> data is loaded.
>>>
>>> Having multiple XML files is the problem.  The data loading code will
>>> read all of the files in one load so if each file contains an organism
>>> it will still count as a duplicate.  It would be best to create one
>>>file
>>> with all the data in and only one organism.
>> 
>> Here each xml file is a separate source type large-xml, how can I
>>combine
>> into one? Like cageTags with nucleotide length 26,27,28 and each for
>> forward/reverse strand etc.
>> 
>Ah, in that case I don't know what the problem with the XML is, but the
>error message means that two organism items have been loaded.  Is it
>possible that more than one XML was read for one of the sources?  This
>can happen if they are in sub-directories.

Yes, it failed again and this is probably due to subdirs, I will make sure
these are separate now.


>
>>>
>>>> What do you think?
>>>>
>>>> If I recall correctly when I was loading InterPro Domains, if I
>>>>provide
>>>> any attribute other than the primaryId, it failed with duplicate
>>>>objects
>>>> error at the data integration stage; When I provided only the
>>>>primaryId,
>>>> all went fine.
>>>>
>>>> Now I am trying to use only the taxonId for Organism when I format my
>>>> xml
>>>> file and try again. It takes a few hours to load the flymine data and
>>>>if
>>>> it fails at integration stage, I load it again, update the model and
>>>>try
>>>> integrate again, as shown in the build flymine with own data page.
>>>
>>> You could speed this up by creating a database copy once the dump is
>>> loaded up, e.g.

Now that it failed and includes my cageTag tables with data and model
updated to include my tables, can I make a copy or I need to exclude this
duplication situation first and later make a clean copy?


Thanks, I now understand the Gbrowse issue, will this be fixed soon?

Best Wishes,

Intikhab
>>>
>>> createdb -T flymine-db flymine-db-backup
>> 
>> 



More information about the dev mailing list