[InterMine Dev] UniProt Parser problems.
jab250 at mrc-mbu.cam.ac.uk
Wed Sep 14 11:09:06 BST 2011
We would like to know if the UniProt parser could be changed to allow
loading of fragments as a option, as we'd like to include them in
MitoMiner and would rather not have to hack the parser to do it.
Also, we've noticed a problem with the parser when it runs across
entries that used to be covered by the same accession number. For
example P06748 and P10276 both used to be known as Q13440, but are now
recorded as two separate proteins. As the uniprot entries both contain
Q13440 as a secondary accession, when the parser encounters it again it
considers the record to be a duplicate and does not populate any fields.
It's only a handful of records in our dataset, which is why we didn't
notice it until now. The problem is that if we take out the checking for
duplicates, of course we wind up with a lot of duplicat entries. Could
the parser be changed to have an option to only check against the first
UniProt accession in a file?
More information about the dev