[InterMine Dev] Gff3 loader custom Gff3 Handler questions

Fengyuan Hu fh293 at cam.ac.uk
Wed Aug 28 15:05:27 BST 2013


I also noticed that the synonym info was not parsed, here is the fix:
https://github.com/intermine/intermine/commit/42e86288c355a20b2c9a13719fbdd2afab286a31

Why is EnsemblGenes duplicated in your gff3? The fix is able to remove 
the duplication.

Cheers
Fengyuan

On 28/08/13 11:58, Fengyuan Hu wrote:
> Hi Pushkala,
>
> I have just pushed a patch, please take a look here:
> https://github.com/intermine/intermine/commit/08a07641b14c2b1b759bfb5cb0d3e5cde626e777
>
> I don't have ncbiGeneNumber in my model, could you please test it at 
> your side?
>
> Cheers
> Fengyuan
>
> On 27/08/13 17:02, Fengyuan Hu wrote:
>> Hi Pushkala,
>>
>> Sorry for the delay. I think I know where goes wrong in the 
>> converter, I need some time to fix it. Bear with me.
>>
>> Fengyuan
>>
>> On 12/08/13 19:06, Jayaraman, Pushkala wrote:
>>>
>>> Hello,
>>>
>>> I have a couple questions regarding the Gff3 handler.
>>>
>>> Now with a gff3 file that looks like this (see below) with the 
>>> gff_config.properties file having the following attributes:
>>>
>>> 10116.terms=gene, mRNA, Exon, CDS, ThreePrimeUTR, FivePrimeUTR
>>>
>>> 10116.attributes.ID=primaryIdentifier
>>>
>>> 10116.attributes.ID=secondaryIdentifier
>>>
>>> 10116.attributes.Note=description
>>>
>>> 10116.attributes.Dbxref.EntrezGene=ncbiGeneNumber
>>>
>>> 10116.attributes.Dbxref.EnsemblGenes=synonym
>>>
>>> Im having problems loading this data as I find that the description 
>>> doesn't get loaded.
>>>
>>> From what I understoof in the docs, adding your required fileds in 
>>> the gff_config.properties will allow the gff3 parser to extract 
>>> those values and assign them to the required fields in the gene 
>>> model.. Am I using the gff_config.properties file wrongly?
>>>
>>> Or am I supposed to write a custom gff3 parser irrespective of what 
>>> I have in the gff_config.properties file?
>>>
>>> 10      RGD     gene    4816612 4817340 .       +       . 
>>> Name=Tnp2;Alias=RGD3885,3885,transition protein 
>>> 2;ID=RGD:3885;Note=ENCODES a protein that exhibits zinc ion bindin
>>>
>>> g AND  INVOLVED IN acrosome reaction (ortholog) AND  binding of 
>>> sperm to zona pellucida (ortholog) AND  penetration of zona 
>>> pellucida (ortholog) AND  FOUND IN nucleus AND  INTERA
>>>
>>> CTS WITH 17alpha-ethynylestradiol AND ammonium chloride AND  cadmium 
>>> dichloride;fullName=transition protein 
>>> 2;Dbxref=EntrezGene:24840,UniGene:10430,IMAGE_CLONE:7131008,MGC_CLONE
>>>
>>> :BC078849,EnsemblGenes:ENSRNOG00000002566,UniProt:P11101,UniProt:B3LF38,EnsemblGenes:ENSRNOG00000002566;
>>>
>>> 10      RGD     gene    56399721 56411150        .       +       . 
>>> Name=Tp53;Alias=RGD3889,3889,tumor protein 
>>> p53;ID=RGD:3889;Note=ENCODES a protein that exhibits pr
>>>
>>> otein C-terminus binding AND sequence-specific DNA binding AND  
>>> ubiquitin protein ligase binding AND  INVOLVED IN aging AND  
>>> cellular response to organonitrogen compound AND  ne
>>>
>>> gative regulation of DNA biosynthetic process AND  PARTICIPATES IN 
>>> altered p53 signaling pathway AND  endometrial cancer pathway AND  
>>> non-small cell lung cancer pathway AND  ASSO
>>>
>>> CIATED WITH Dementia  Vascular AND Diabetic Nephropathies AND  
>>> Ischemia AND  FOUND IN chromatin AND  cytoplasm AND  cytosol AND  
>>> INTERACTS WITH (-)-citrinin AND  (-)-epigallocat
>>>
>>> echin 3-gallate AND  (R)-lipoic acid;fullName=tumor protein 
>>> p53;Dbxref=PharmGKB:PA36679,EntrezGene:24842,UniGene:54443,EnsemblGenes:ENSRNOG00000010756,KEGGPathway:04010,KEGGPathw
>>>
>>> ay:04110,KEGGPathway:04115,KEGGPathway:04210,KEGGPathway:04310,KEGGPathway:04722,KEGGPathway:05014,KEGGPathway:05016,KEGGPathway:05200,KEGGPathway:05210,KEGGPathway:05212,KEGGPat
>>>
>>> hway:05213,KEGGPathway:05214,KEGGPathway:05215,KEGGPathway:05216,KEGGPathway:05217,KEGGPathway:05218,KEGGPathway:05219,KEGGPathway:05220,KEGGPathway:05222,KEGGPathway:05223,IMAGE
>>>
>>> _CLONE:7193583,MGC_CLONE:BC081788,IMAGE_CLONE:7384467,MGC_CLONE:BC098663,UniProt:P10361,EnsemblGenes:ENSRNOG00000010756,KEGGPathway:05160,KEGGPathway:05162,KEGGPathway:05166,KEGG
>>>
>>> Pathway:05168,KEGGPathway:04151,KEGGPathway:05161,KEGGPathway:05203,KEGGPathway:05202,KEGGPathway:05169,KEGGPathway:05205,UniProt:Q9JLD9;
>>>
>>> 10      RGD     mRNA    4816612 4817340 .       +       . 
>>> Name=NM_017057;Parent=RGD:3885;gene=Tnp2;RefSeqStatus=PROVISIONAL;Alias=RGD:2752358;ID=mRNARGD2752358_t00;isNon-Co
>>>
>>> ding=N;
>>>
>>> 10      RGD     mRNA    56399721 56411150        .       +       . 
>>> Name=NM_030989;Parent=RGD:3889;gene=Tp53;RefSeqStatus=REVIEWED;Alias=RGD:2752318;ID=mRNARGD2752318
>>>
>>> _t00;isNon-Coding=N;
>>>
>>> Pushkala Jayaraman
>>>
>>> Programmer/Analyst - Rat Genome Database
>>>
>>> Human and Molecular Genetics Center
>>>
>>> Medical College of Wisconsin
>>>
>>> 414-955-2229
>>>
>>> http://rgd.mcw.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at intermine.org
>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at intermine.org
>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>
>
>
> _______________________________________________
> dev mailing list
> dev at intermine.org
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20130828/ffce775b/attachment.html>


More information about the dev mailing list