Skip to content

zebrafishCC/findConservedProteinVariant

Repository files navigation

Human variant infomation (clinvar_20181202.vcf) was downloaded from clinVar database. Human and zebrafish gene orthologues
(human_orthos_2018.12.06.txt) information was obtained from ZFIN. Final output for conserved variants that can be performed
in zebrafish are seperated accoring to their orientation.

Head clinvar_20181202.vcf
##fileformat=VCFv4.1
##fileDate=2018-12-02
##source=ClinVar
##reference=GRCh38
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
chengchen@chengchen-Lenovo-ideapad-Y700-17ISK:~/data/base-editing/scripts$ head -40 clinvar_20181202.vcf 
##fileformat=VCFv4.1
##fileDate=2018-12-02
##source=ClinVar
##reference=GRCh38
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
1	1014042	475283	G	A	.	.	ALLELEID=446939;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014042G>A;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Benign;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=143888043
1	1014122	542074	C	T	.	.	ALLELEID=514926;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014122C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1;RS=150861311
1	1014143	183381	C	T	.	.	ALLELEID=181485;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014143C>T;CLNREVSTAT=no_assertion_criteria_provided;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;CLNVI=OMIM_Allelic_Variant:147571.0003;GENEINFO=ISG15:9636;MC=SO:0001587|nonsense;ORIGIN=1;RS=786201005
1	1014179	542075	C	T	.	.	ALLELEID=514896;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.11:g.1014179C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Uncertain_significance;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=ISG15:9636;MC=SO:0001583|missense_variant;ORIGIN=1


head human_orthos_2018.12.06.txt 
Date: 2018.12.06
ZFIN ID	ZFIN Symbol	ZFIN Name	Human Symbol	Human Name	OMIM IDGene ID	HGNC ID	Evidence	Pub ID	
ZDB-GENE-000112-47	ppardb	peroxisome proliferator-activated receptor delta b	PPARD	peroxisome proliferator activated receptor delta	600409	5467	9235	AA	ZDB-PUB-060313-16
ZDB-GENE-000112-47	ppardb	peroxisome proliferator-activated receptor delta b	PPARD	peroxisome proliferator activated receptor delta	600409	5467	9235	AA	ZDB-PUB-070210-39
ZDB-GENE-000112-47	ppardb	peroxisome proliferator-activated receptor delta b	PPARD	peroxisome proliferator activated receptor delta	600409	5467	9235	AA	ZDB-PUB-071118-46
ZDB-GENE-000112-47	ppardb	peroxisome proliferator-activated receptor delta b	PPARD	peroxisome proliferator activated receptor delta	600409	5467	9235	AA	ZDB-PUB-150121-5


head all_variants_positive.txt 
MYO15A	ENST00000205890	myo15aa	ENSDART00000149546	2382	A/V	AGAGGGATTGCTGAAATTTGGcCGCCACCGTTGAggCCTGAT
DLD	ENST00000205402	dldh	ENSDART00000006709	60	A/V	GTACCTTAAACCCAAGCTGAGcTGCTTTGATAGCggCAACAT
DLD	ENST00000205402	dldh	ENSDART00000006709	239	A/V	CATGGCCAAGAAATTCCACTGcTGTAACTTTggCACCCAACC
ABCC6	ENST00000205557	abcc6a	ENSDART00000189619	158	A/V	ACTCTTCCTCAGCTGTTTTGcAGACCAggCACCCTTAggCAA
TSC2	ENST00000219476	tsc2	ENSDART00000158948	122	A/V	TAGGGTGTACTCCGAGGACGcCGCTCTCCTCCGCggTGCTGT
TSC2	ENST00000219476	tsc2	ENSDART00000158948	264	A/V	CACGCTGGTCTCCTACAGAGcTCAggCCATCCAGCCggCCAA
TSC2	ENST00000219476	tsc2	ENSDART00000158948	454	A/V	TTTCCTGCTGCTGATGAGGGcTGATTCTCTACATCGTCTCgg

head all_variants_negative.txt
NEXMIF	ENST00000055682	nexmifa	ENSDART00000149893	347	G/D	AccATGCAGTAGAAAGGAGGgCCCAAAAGAGAAACCAGACCA
B4GALT7	ENST00000029410	b4galt7	ENSDART00000170614	215	C/Y	AGCGGTTTGACATccCGTTgCACTTTAGGCAGAAATCACAAG
B4GALT7	ENST00000029410	b4galt7	ENSDART00000170614	273	A/T	ATATTAccTGTTTTTGTGCAgCGATCCGCTTCTGGTCTCTTT
OTC	ENST00000039007	otc	ENSDART00000089526	88	S/N	TGGACATGCGGGTccTAGTgCTCCTCTTCTCAAATATCATGG
OTC	ENST00000039007	otc	ENSDART00000089526	172	A/T	ccTGTAGTGTTAGCAGGTCAgCCAGAATCTGAATAGGGTGGT
OTC	ENST00000039007	otc	ENSDART00000089526	251	A/T	GAACACTGCTGTccCGCGCTgCCTCCACTGGATCTGACACAA
SPG21	ENST00000204566	spg21	ENSDART00000130967	180	A/T	TGTCAAccATGAAGTCAATTgCGTCTGCCATTTTTGGGTCCA

About

Looking for conserved protein variants between zebrafish and human which are suitable for C to T base editing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors