-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathReadme
More file actions
76 lines (70 loc) · 7.77 KB
/
Readme
File metadata and controls
76 lines (70 loc) · 7.77 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
Human variant infomation (clinvar_20181202.vcf) was downloaded from clinVar database. Human and zebrafish gene orthologues
(human_orthos_2018.12.06.txt) information was obtained from ZFIN. Final output for conserved variants that can be performed
in zebrafish are seperated accoring to their orientation.
Head clinvar_20181202.vcf
##fileformat=VCFv4.1
##fileDate=2018-12-02
##source=ClinVar
##reference=GRCh38
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
chengchen@chengchen-Lenovo-ideapad-Y700-17ISK:~/data/base-editing/scripts$ head -40 clinvar_20181202.vcf
##fileformat=VCFv4.1
##fileDate=2018-12-02
##source=ClinVar
##reference=GRCh38
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 1014042 475283 G A . . ALLELEID=446939;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014042G>A;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=143888043
1 1014122 542074 C T . . ALLELEID=514926;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014122C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=150861311
1 1014143 183381 C T . . ALLELEID=181485;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014143C>T;CLNREVSTAT=no_assertion_criteria_provided;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;CLNVI=OMIM_Allelic_Variant:147571.0003;GENEINFO=ISG15:9636;MC=SO:0001587|nonsense;ORIGIN=1;RS=786201005
1 1014179 542075 C T . . ALLELEID=514896;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014179C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1
head human_orthos_2018.12.06.txt
Date: 2018.12.06
ZFIN ID ZFIN Symbol ZFIN Name Human Symbol Human Name OMIM IDGene ID HGNC ID Evidence Pub ID
ZDB-GENE-000112-47 ppardb peroxisome proliferator-activated receptor delta b PPARD peroxisome proliferator activated receptor delta 600409 5467 9235 AA ZDB-PUB-060313-16
ZDB-GENE-000112-47 ppardb peroxisome proliferator-activated receptor delta b PPARD peroxisome proliferator activated receptor delta 600409 5467 9235 AA ZDB-PUB-070210-39
ZDB-GENE-000112-47 ppardb peroxisome proliferator-activated receptor delta b PPARD peroxisome proliferator activated receptor delta 600409 5467 9235 AA ZDB-PUB-071118-46
ZDB-GENE-000112-47 ppardb peroxisome proliferator-activated receptor delta b PPARD peroxisome proliferator activated receptor delta 600409 5467 9235 AA ZDB-PUB-150121-5
head all_variants_positive.txt
MYO15A ENST00000205890 myo15aa ENSDART00000149546 2382 A/V AGAGGGATTGCTGAAATTTGGcCGCCACCGTTGAggCCTGAT
DLD ENST00000205402 dldh ENSDART00000006709 60 A/V GTACCTTAAACCCAAGCTGAGcTGCTTTGATAGCggCAACAT
DLD ENST00000205402 dldh ENSDART00000006709 239 A/V CATGGCCAAGAAATTCCACTGcTGTAACTTTggCACCCAACC
ABCC6 ENST00000205557 abcc6a ENSDART00000189619 158 A/V ACTCTTCCTCAGCTGTTTTGcAGACCAggCACCCTTAggCAA
TSC2 ENST00000219476 tsc2 ENSDART00000158948 122 A/V TAGGGTGTACTCCGAGGACGcCGCTCTCCTCCGCggTGCTGT
TSC2 ENST00000219476 tsc2 ENSDART00000158948 264 A/V CACGCTGGTCTCCTACAGAGcTCAggCCATCCAGCCggCCAA
TSC2 ENST00000219476 tsc2 ENSDART00000158948 454 A/V TTTCCTGCTGCTGATGAGGGcTGATTCTCTACATCGTCTCgg
head all_variants_negative.txt
NEXMIF ENST00000055682 nexmifa ENSDART00000149893 347 G/D AccATGCAGTAGAAAGGAGGgCCCAAAAGAGAAACCAGACCA
B4GALT7 ENST00000029410 b4galt7 ENSDART00000170614 215 C/Y AGCGGTTTGACATccCGTTgCACTTTAGGCAGAAATCACAAG
B4GALT7 ENST00000029410 b4galt7 ENSDART00000170614 273 A/T ATATTAccTGTTTTTGTGCAgCGATCCGCTTCTGGTCTCTTT
OTC ENST00000039007 otc ENSDART00000089526 88 S/N TGGACATGCGGGTccTAGTgCTCCTCTTCTCAAATATCATGG
OTC ENST00000039007 otc ENSDART00000089526 172 A/T ccTGTAGTGTTAGCAGGTCAgCCAGAATCTGAATAGGGTGGT
OTC ENST00000039007 otc ENSDART00000089526 251 A/T GAACACTGCTGTccCGCGCTgCCTCCACTGGATCTGACACAA
SPG21 ENST00000204566 spg21 ENSDART00000130967 180 A/T TGTCAAccATGAAGTCAATTgCGTCTGCCATTTTTGGGTCCA