[InterMine Dev] Ensembl data

Jayaraman, Pushkala pjayaraman at mcw.edu
Thu Mar 28 19:45:36 GMT 2013


Hi Fengyuan,
I had a question about the Details in the Ensembl SNP page:

# load data into db
$ mysql -u DB_USER -p homo_sapiens_core_70 < homo_sapiens_variation_70_37.sql
$ mysqlimport -u DB_USER -p homo_sapiens_variation_70 -L *.txt -v

What is the file *.txt? what does it contain? Is it the entire folder under : homo_sapiens_variation_70_37 ?

Variation db can be big and takes long time to query, one way to optimise the speed is to create precomputed tables, this process will normally take ~1.5hr to complete for Human SNPs:
# precompute tables
$ mysql -u DB_USER -p
mysql> use homo_sapiens_variation_70;

mysql> CREATE TABLE mM_snp_tmp_no_order_chr_all SELECT vf.variation_feature_id, vf.variation_name, vf.variation_id, vf.allele_string, sr.name AS seq_region_name, vf.map_weight, vf.seq_region_start, vf.seq_region_end, vf.seq_region_strand, s.name AS source_name, vf.validation_status, vf.consequence_types AS variation_feature_consequence_types, tv.cdna_start,tv.consequence_types AS transcript_variation_consequence_types,tv.pep_allele_string,tv.feature_stable_id, tv.sift_prediction, tv.sift_score, tv.polyphen_prediction, tv.polyphen_score FROM seq_region sr, source s, variation_feature vf  LEFT JOIN (transcript_variation tv) ON (vf.variation_feature_id = tv.variation_feature_id AND tv.consequence_types NOT IN ('5KB_downstream_variant', '5KB_upstream_variant','500B_downstream_variant','2KB_upstream_variant')) WHERE vf.seq_region_id = sr.seq_region_id AND vf.source_id = s.source_id;

mysql> CREATE TABLE mM_snp_tmp_ordered_chr_all SELECT * FROM mM_snp_tmp_no_order_chr_all ORDER BY seq_region_name, variation_id;


                Also,
Why do you have to create tables.. ? the .sql file does that for you right?


Pushkala

From: Fengyuan Hu [mailto:fh293 at cam.ac.uk]
Sent: Thursday, March 28, 2013 4:58 AM
To: Jayaraman, Pushkala
Cc: Julie Sullivan
Subject: Re: Ensembl data

Hi Pushkala,

I've updated the Ensembl SNP<http://intermine.readthedocs.org/en/latest/database/data-sources/library/variation/ensembl-snp/> page, please take a look and let me know if there is vagueness.

Cheers
Fengyuan



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.intermine.org/pipermail/dev/attachments/20130328/8839ea28/attachment.html>


More information about the dev mailing list