[InterMine Dev] Merging duplicate entries

Thomas TRIPLET thomastriplet at gmail.com
Mon Mar 11 18:11:36 GMT 2013


Thanks Julie, I forgot about that file. That solved my issue.

--
Thomas Triplet, Jr. Eng., Ph.D.
http://www.thomastriplet.net


On Mon, Mar 11, 2013 at 12:14 PM, Julie Sullivan <julie at flymine.org> wrote:

> Do you have the integration key for Protein set to be "primary identifier"?
>
> http://intermine.readthedocs.**org/en/latest/database/**
> database-building/primary-keys<http://intermine.readthedocs.org/en/latest/database/database-building/primary-keys>
>
>
> On 11/03/13 15:49, Thomas TRIPLET wrote:
>
>> Hello,
>> I'm importing  2 data sources describing proteins. One is a typical fasta
>> file, the other one is a CSV file imported using a custom source, which is
>> based on examples from FlyMine:
>>
>>
>>      public void process(Reader reader) throws Exception {
>> Iterator<?>  lineIter = FormattedTextParser.**parseTabDelimitedReader(**
>> reader);
>>
>> if(lineIter.hasNext()) // skip first header line
>>   lineIter.next();
>>
>> while (lineIter.hasNext()) {
>>   try {
>> String[] line = (String[]) lineIter.next();
>>
>> if(line==null || line[0].startsWith("#")) // Make sure the line isn't
>> empty
>> or not commented out
>>   continue;
>>
>> String proteinId = line[PROTEIN_IDX];
>>   Item protein = getProtein(proteinId);
>>
>>   protein.setAttribute("name", line[NAME_IDX]);
>> } catch(Exception e) {
>>   System.out.println("ERROR occured while converting aniger-protein-name
>> ("
>> + e.getMessage() + ")");
>>   e.printStackTrace();
>> System.exit(-1);
>>   }
>> }
>>   for(Item protein: proteins.values())
>> store(protein);
>>      } // eo process()
>>
>>      /**
>>       * Creates a protein of fetches it if it exists
>>       * @param id ID of the protein
>>       * @return The protein as an Item
>>       */
>> private Item getProtein(String id) throws ObjectStoreException {
>>   Item protein = proteins.get(id);
>> if (protein == null) {
>>   protein = createItem("Protein");
>> protein.setAttribute("***primaryIdentifier*", id);
>>
>>   proteins.put(id, protein);
>> }
>>   return protein;
>> } // eo getProtein()
>>
>>
>> In project.xml, I have:
>>
>> <source name="aniger-protein-fasta" type="fasta">
>>   <property name="fasta.className" value="org.intermine.model.**
>> bio.Protein"/>
>> <property name="fasta.classAttribute" value="*primaryIdentifier*"/>
>>
>>        <property name="fasta.sequenceType" value="protein" />
>> <property name="fasta.dataSourceName" value="CSFG"/>
>>   <property name="fasta.dataSetTitle" value="Protein sequences in A.
>> niger"/>
>> <property name="fasta.taxonId" value="5061"/>
>>   <property name="fasta.includes" value="Aspni3p4.**
>> representatives.faa"/>
>> <property name="src.data.dir"
>> location="/home/intermine/**data/csfg/a_niger/"/>
>>   </source>
>> <source name="aniger-protein-name" type="csfg-protein-name">
>>   <property name="src.data.dir"
>> location="/home/intermine/**data/csfg/a_niger/"/>
>>   <property name="src.data.dir.includes"
>> value="Aspni3p4_annotations_**wf_march2013.csv"/>
>>   </source>
>>
>> The IDs in the 2 sources match. Yet, after a successful build, the UI
>> shows
>> 2 proteins with the same primaryIdentifier. Is there a way to enforce
>> entities with the same id to merge?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>> --
>> Thomas Triplet, Jr. Eng., Ph.D.
>> http://www.thomastriplet.net
>>
>>
>>
>>
>> ______________________________**_________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-**bin/mailman/listinfo/dev<http://mail.intermine.org/cgi-bin/mailman/listinfo/dev>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20130311/1b4d473c/attachment-0001.html>


More information about the dev mailing list