I was trying to create a blastable database from the pre-compiled COI database CO1_combined_derep_and_clean.fasta . The header of the fasta sequnces after cleanup reads something like >KM252950.1_representative_of_12_identical_accessions. When I tried to use this file as the input fasta for makeblastdb , with the command
makeblastdb -in CO1_combined_derep_and_clean.fasta -dbtype nucl -parse_seqids -out rCRUX_COI -title "rCRUX_Leray"
I got an error message complaining about the length of the sequence headers
"BLAST Database creation error: Near line 1, the local id is too long. Its length is 52 but the maximum allowed local id length is 50. Please find and correct all local ids that are too long."
I checked this was the issue by shortening the sequence headers with sed
sed 's\_representative_of\\g' CO1_combined_derep_and_clean.fasta > CO1_ready.fasta
To save a headache to future Moncho, I also shortened the headers in the taxonomy file
sed 's\_representative_of\\g' CO1_combined_derep_and_clean_taxonomy.txt > CO1_tax_ready.txt
Now the headers read >KM252950.1_12_identical_accessions
I think we could modify derep_and_clean to account for this
I was trying to create a blastable database from the pre-compiled COI database
CO1_combined_derep_and_clean.fasta. The header of the fasta sequnces after cleanup reads something like>KM252950.1_representative_of_12_identical_accessions. When I tried to use this file as the input fasta formakeblastdb, with the commandmakeblastdb -in CO1_combined_derep_and_clean.fasta -dbtype nucl -parse_seqids -out rCRUX_COI -title "rCRUX_Leray"I got an error message complaining about the length of the sequence headers
"BLAST Database creation error: Near line 1, the local id is too long. Its length is 52 but the maximum allowed local id length is 50. Please find and correct all local ids that are too long."
I checked this was the issue by shortening the sequence headers with sed
sed 's\_representative_of\\g' CO1_combined_derep_and_clean.fasta > CO1_ready.fastaTo save a headache to future Moncho, I also shortened the headers in the taxonomy file
sed 's\_representative_of\\g' CO1_combined_derep_and_clean_taxonomy.txt > CO1_tax_ready.txtNow the headers read
>KM252950.1_12_identical_accessionsI think we could modify derep_and_clean to account for this