[InterMine Dev] UniProt Parser problems.

James Blackshaw jab250 at mrc-mbu.cam.ac.uk
Wed Sep 14 11:09:06 BST 2011


We would like to know if the UniProt parser could be changed to allow 
loading of fragments as a option, as we'd like to include them in 
MitoMiner and would rather not have to hack the parser to do it.

Also, we've noticed a problem with the parser when it runs across 
entries that used to be covered by the same accession number. For 
example P06748 and P10276 both used to be known as Q13440, but are now 
recorded as two separate proteins. As the uniprot entries both contain 
Q13440 as a secondary accession, when the parser encounters it again it 
considers the record to be a duplicate and does not populate any fields. 
It's only a handful of records in our dataset, which is why we didn't 
notice it until now. The problem is that if we take out the checking for 
duplicates, of course we wind up with a lot of duplicat entries. Could 
the parser be changed to have an option to only check against the first 
UniProt accession in a file?


More information about the dev mailing list