---------------------Processing started----------------------
Initial checks successful
------------Opening the gene presence/absence file------------
Core genes must be found in 530 or more genomes
Low frequency genes must be found in less than 27 genomes
Traceback (most recent call last):
File "/home/ahhayes/.conda/envs/Corekaburra/bin/Corekaburra", line 8, in <module>
sys.exit(main())
File "/home/ahhayes/.conda/envs/Corekaburra/lib/python3.9/site-packages/Corekaburra/__main__.py", line 173, in main
core_dict, low_freq_dict, acc_gene_dict = read_gene_presence_absence(input_pres_abs_file_path, args.core_cutoff,
File "/home/ahhayes/.conda/envs/Corekaburra/lib/python3.9/site-packages/Corekaburra/parse_gene_presence_absence.py", line 185, in read_gene_presence_absence
fragments_close = check_fragmented_gene(fragment_info, input_gffs, tmp_folder_path, logger)
File "/home/ahhayes/.conda/envs/Corekaburra/lib/python3.9/site-packages/Corekaburra/parse_gene_presence_absence.py", line 67, in check_fragmented_gene
first_fragment_contig = gff_database[fragment_pieces[0]][0]
File "/home/ahhayes/.conda/envs/Corekaburra/lib/python3.9/site-packages/gffutils/interface.py", line 280, in __getitem__
raise FeatureNotFoundError(key)
gffutils.exceptions.FeatureNotFoundError: SF370_01117
# if '.gff' not in genome:
# try:
# gff_file = [file for file in input_gffs if f'{genome}.gff' in file][0]
# db_name = os.path.join(tmp_folder_path, f'{genome}_db')
# except IndexError:
# raise NotImplementedError(f'No gff match was found when searching fragments for genome: {genome}')
# else:
# gff_file = genome
# db_name = f"{os.path.basename(genome)}_db"
# db_name = os.path.join(tmp_folder_path, db_name)
if '.gff' not in genome:
try:
gff_file = next(file for file in input_gffs if os.path.basename(file) == f'{genome}.gff')
db_name = os.path.join(tmp_folder_path, f'{genome}_db')
except StopIteration:
logger.error(f"No exact gff match was found for genome: {genome}")
logger.debug(f"Available GFF files: {[os.path.basename(f) for f in input_gffs]}")
raise NotImplementedError(f'No exact gff match was found for genome: {genome}')
else:
gff_file = genome
db_name = f"{os.path.basename(genome)}_db"
db_name = os.path.join(tmp_folder_path, db_name)
will aim to make change in forked version and pull request bugfix soon but wanted to document the issue ASAP incase anyone else had the same issue.
Corekaburra version 0.0.5 (and presumably earlier)
ERROR MESSAGE:
gff_file = [file for file in input_gffs if f'{genome}.gff' in file][0]gff_file = next(file for file in input_gffs if os.path.basename(file) == f'{genome}.gff')(old code commented out - some extra troubleshooting error messages also suggested) - when tested with the fix on a problematic dataset corekaburra ran without any crashing.will aim to make change in forked version and pull request bugfix soon but wanted to document the issue ASAP incase anyone else had the same issue.