Database request: Oncorhynchus nerka sockeye salmon

### Database requests

1. Organism name: Oncorhynchus nerka (https://www.ncbi.nlm.nih.gov/datasets/taxonomy/8023) (sockeye salmon)
2. Link gene definition file (e.g. GTF / GFF / GenBank): https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/034/236/695/GCF_034236695.1_Oner_Uvic_2.0/GCF_034236695.1_Oner_Uvic_2.0_genomic.gtf.gz
3. Link to Genome FASTA file/s: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/034/236/695/GCF_034236695.1_Oner_Uvic_2.0/GCF_034236695.1_Oner_Uvic_2.0_genomic.fna.gz
4. Link to CDS FASTA file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/034/236/695/GCF_034236695.1_Oner_Uvic_2.0/GCF_034236695.1_Oner_Uvic_2.0_cds_from_genomic.fna.gz
5. Link to Protein FASTA file: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/034/236/695/GCF_034236695.1_Oner_Uvic_2.0/GCF_034236695.1_Oner_Uvic_2.0_protein.faa.gz

Thank you for any support on this request. I tried to install the database manually, but after a while I was not able to get it to work without error. 
Specifically, although I was able to get the `sequences.fa`, `cds.fa`, and `genes.gtf` to be read properly by the command:    
`java -Xmx20g -jar snpEff.jar build -gtf22 -v Oner_Uvic_2.0 2>&1 | tee Oner_Uvic_2.0.build`    
...but it resulted in the following error at the data checking stage of the script: 
```
FATAL ERROR: No CDS checked. This might be caused by differences in FASTA file
transcript IDs respect to database's transcript's IDs.
Transcript IDs from database (sample):
        'unassigned_transcript_3780'
        'XM_029687664.2'
        'unassigned_transcript_3781'
        'unassigned_transcript_3778'
        'unassigned_transcript_3779'
        'unassigned_transcript_3784'
        'unassigned_transcript_3782'
        'unassigned_transcript_3783'
        'XM_065019568.1'
        'XM_029661614.2'
        'XM_029627598.2'
        'XR_003863663.2'
        'XR_003863664.2'
        'XM_065018188.1'
        'XM_029627603.2'
        'XM_029627600.2'
        'XM_065007313.1'
        'XM_029627601.2'
        'XM_029627599.2'
        'XM_029659812.2'
        'XM_029627610.2'
        'XM_029627611.2'
Transcript IDs from database (fasta file):
        'lcl|NC_088413.1_cds_XP_064859648.1_41028'
        'lcl|NC_088415.1_cds_XP_029477646.1_45063'
        'lcl|NC_088404.1_cds_XP_064878381.1_19670'
        'lcl|NC_088419.1_cds_XP_064865376.1_56512'
        'lcl|NC_088405.1_cds_XP_029525536.1_21965'
        'lcl|NC_088410.1_cds_XP_064858049.1_37672'
        'lcl|NC_088414.1_cds_XP_029503937.2_44172'
        'lcl|NC_088419.1_cds_XP_064864584.1_55919'
        'lcl|NW_027039711.1_cds_XP_064871287.1_68469'
        'lcl|NC_088423.1_cds_XP_064869196.1_65209'
        'lcl|NC_088404.1_cds_XP_029524279.2_20779'
        'lcl|NC_088398.1_cds_XP_064862988.1_4603'
        'lcl|NC_088402.1_cds_XP_064876276.1_14724'
        'lcl|NC_088415.1_cds_XP_029478572.1_46073'
        'lcl|NC_088415.1_cds_XP_064861311.1_44650'
        'lcl|NC_088410.1_cds_XP_064857172.1_35158'
        'lcl|NC_088418.1_cds_XP_064864437.1_53888'
        'lcl|NC_088407.1_cds_XP_029530877.1_27482'
        'lcl|NC_088417.1_cds_XP_029482230.1_49588'
        '1_cds_XP_029508511'
        '1_cds_XP_029508512'
        '1_cds_XP_029508515'
```

Searching online, I see similar errors, and it appears that a long-form accession ID is provided within the fasta file that does not match that used by the gtf file, but unfortunately, I am not able to resolve this. 

Thank you very much for any help with this and please let me know if I can provide any additional information to help this genome be included within the snpEff genomic database. 
Ben

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database request: Oncorhynchus nerka sockeye salmon #628

Database requests

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Database request: Oncorhynchus nerka sockeye salmon #628

Description

Database requests

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions