[InterMine Dev] Gff3 loader custom Gff3 Handler questions

Jayaraman, Pushkala pjayaraman at mcw.edu
Mon Aug 12 19:06:35 BST 2013


Hello,
I have a couple questions regarding the Gff3 handler.
Now with a gff3 file that looks like this (see below) with the gff_config.properties file having the following attributes:

10116.terms=gene, mRNA, Exon, CDS, ThreePrimeUTR, FivePrimeUTR
10116.attributes.ID=primaryIdentifier
10116.attributes.ID=secondaryIdentifier
10116.attributes.Note=description
10116.attributes.Dbxref.EntrezGene=ncbiGeneNumber
10116.attributes.Dbxref.EnsemblGenes=synonym


Im having problems loading this data as I find that the description doesn't get loaded.
>From what I understoof in the docs, adding your required fileds in the gff_config.properties will allow the gff3 parser to extract those values and assign them to the required fields in the gene model.. Am I using the gff_config.properties file wrongly?
Or am I supposed to write a custom gff3 parser irrespective of what I have in the gff_config.properties file?


10      RGD     gene    4816612 4817340 .       +       .       Name=Tnp2;Alias=RGD3885,3885,transition protein 2;ID=RGD:3885;Note=ENCODES a protein that exhibits zinc ion bindin
g AND  INVOLVED IN acrosome reaction (ortholog) AND  binding of sperm to zona pellucida (ortholog) AND  penetration of zona pellucida (ortholog) AND  FOUND IN nucleus AND  INTERA
CTS WITH 17alpha-ethynylestradiol AND  ammonium chloride AND  cadmium dichloride;fullName=transition protein 2;Dbxref=EntrezGene:24840,UniGene:10430,IMAGE_CLONE:7131008,MGC_CLONE
:BC078849,EnsemblGenes:ENSRNOG00000002566,UniProt:P11101,UniProt:B3LF38,EnsemblGenes:ENSRNOG00000002566;
10      RGD     gene    56399721        56411150        .       +       .       Name=Tp53;Alias=RGD3889,3889,tumor protein p53;ID=RGD:3889;Note=ENCODES a protein that exhibits pr
otein C-terminus binding AND  sequence-specific DNA binding AND  ubiquitin protein ligase binding AND  INVOLVED IN aging AND  cellular response to organonitrogen compound AND  ne
gative regulation of DNA biosynthetic process AND  PARTICIPATES IN altered p53 signaling pathway AND  endometrial cancer pathway AND  non-small cell lung cancer pathway AND  ASSO
CIATED WITH Dementia  Vascular AND  Diabetic Nephropathies AND  Ischemia AND  FOUND IN chromatin AND  cytoplasm AND  cytosol AND  INTERACTS WITH (-)-citrinin AND  (-)-epigallocat
echin 3-gallate AND  (R)-lipoic acid;fullName=tumor protein p53;Dbxref=PharmGKB:PA36679,EntrezGene:24842,UniGene:54443,EnsemblGenes:ENSRNOG00000010756,KEGGPathway:04010,KEGGPathw
ay:04110,KEGGPathway:04115,KEGGPathway:04210,KEGGPathway:04310,KEGGPathway:04722,KEGGPathway:05014,KEGGPathway:05016,KEGGPathway:05200,KEGGPathway:05210,KEGGPathway:05212,KEGGPat
hway:05213,KEGGPathway:05214,KEGGPathway:05215,KEGGPathway:05216,KEGGPathway:05217,KEGGPathway:05218,KEGGPathway:05219,KEGGPathway:05220,KEGGPathway:05222,KEGGPathway:05223,IMAGE
_CLONE:7193583,MGC_CLONE:BC081788,IMAGE_CLONE:7384467,MGC_CLONE:BC098663,UniProt:P10361,EnsemblGenes:ENSRNOG00000010756,KEGGPathway:05160,KEGGPathway:05162,KEGGPathway:05166,KEGG
Pathway:05168,KEGGPathway:04151,KEGGPathway:05161,KEGGPathway:05203,KEGGPathway:05202,KEGGPathway:05169,KEGGPathway:05205,UniProt:Q9JLD9;
10      RGD     mRNA    4816612 4817340 .       +       .       Name=NM_017057;Parent=RGD:3885;gene=Tnp2;RefSeqStatus=PROVISIONAL;Alias=RGD:2752358;ID=mRNARGD2752358_t00;isNon-Co
ding=N;
10      RGD     mRNA    56399721        56411150        .       +       .       Name=NM_030989;Parent=RGD:3889;gene=Tp53;RefSeqStatus=REVIEWED;Alias=RGD:2752318;ID=mRNARGD2752318
_t00;isNon-Coding=N;



Pushkala Jayaraman
Programmer/Analyst - Rat Genome Database
Human and Molecular Genetics Center
Medical College of Wisconsin
414-955-2229
http://rgd.mcw.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20130812/08edb797/attachment-0001.html>


More information about the dev mailing list