Do you happen to have taxon identifiers for the training data in this file?
https://genome.cshlp.org/content/suppl/2018/03/22/gr.228429.117.DC1/Supplemental_Table_S1.xlsx
It's currently difficult to know which sequences are associated with these.
Here's the list but it's difficult to search the names. Did you you download them from NCBI or another database?
Organisms
Streptococcus pneumoniae TIGR4
Clostridium acetobutylicum ATCC 824
Staphylococcus carnosus subsp. carnosus TM300
Mycobacterium bovis BCG Pasteur 1173P2
Staphylococcus aureus subsp. aureus Mu50
Orientia tsutsugamushi str. Ikeda
Colwellia psychrerythraea 34H
Synechococcus sp. CC9605
Nitrosococcus oceani ATCC 19707
Anaeromyxobacter dehalogenans 2CP-C
Mycobacterium vanbaalenii PYR-1
Prochlorococcus marinus str. AS9601
Prochlorococcus marinus str. MIT 9515
Shewanella loihica PV-4
Rickettsia akari str. Hartford
Coxiella burnetii RSA 331
Synechococcus sp. PCC 7002
Rhodobacter sphaeroides KD131
Sulfurihydrogenibium azorense Az-Fu1
Vibrio cholerae M66-2
Desulfovibrio desulfuricans subsp. desulfuricans str. ATCC 27774
Ammonifex degensii KC4
Thermobaculum terrenum ATCC BAA-798
Bacillus megaterium QM B1551
Lactobacillus casei BL23
Streptococcus equi subsp. equi 4047
Streptococcus gallolyticus UCN34
Gluconacetobacter diazotrophicus PAl 5
Helicobacter mustelae 12198
Agrobacterium tumefaciens str. C58
Syntrophothermus lipocalidus DSM 12680
Shigella sonnei Ss046
Legionella pneumophila 2300/99 Alcoy
Prevotella melaninogenica ATCC 25845
Acetohalobium arabaticum DSM 5501
Corynebacterium pseudotuberculosis 1002
Cyanothece sp. PCC 7822
Helicobacter pylori SJM180
Streptococcus parasanguinis ATCC 15912
Alicycliphilus denitrificans BC
Escherichia coli O83:H1 str. NRG 857C
Mycoplasma bovis PG45 clone MU clone A2
Bacillus subtilis BSn5
Thermus scotoductus SA-01
Staphylococcus pseudintermedius ED99
Acinetobacter calcoaceticus PHEA-2
Lactobacillus amylovorus strain 30SC
Mycobacterium tuberculosis H37Rv
Lactobacillus buchneri NRRL B-30929
Leuconostoc sp. C2
Weissella koreensis KACC 15510
Collimonas fungivorans Ter331
Streptococcus suis SS12
Salmonella enterica subsp. enterica serovar Typhimurium str. 798
Emticicia oligotrophica DSM 17448
Aequorivita sublithincola DSM 14238
Terriglobus roseus DSM 18391
Pantoea ananatis LMG 5342
Escherichia coli ST131
Mycoplasma gallisepticum VA94_7994-1-7P
Mycoplasma genitalium M2288
Thermus oshimai JL-2
Geitlerinema sp. PCC 7407
Chlamydia trachomatis L1/224
Mycoplasma pneumoniae PO1
Thermoanaerobacterium thermosaccharolyticum M0795
Candidatus Blochmannia chromaiodes str. 640
Bacillus subtilis XF-1
Sinorhizobium meliloti 2011
Raoultella ornithinolytica B6
Lactobacillus rhamnosus LOCK908
Streptococcus agalactiae 09mas018883
Streptococcus agalactiae ILRI005
Listeria monocytogenes strain N1-011A
Chlamydia trachomatis RC-F(s)/342
Proteus mirabilis BB2000
Staphylococcus aureus subsp. aureus SA957
Mycoplasma parvum str. Indiana
Halyomorpha halys symbiont DNA
Lactococcus lactis subsp. lactis KLDS 4.0325
Burkholderia pseudomallei NCTC 13179
Rhizobium leguminosarum bv. trifolii CB782
Listeria monocytogenes WSLC1042
Salmonella enterica subsp. enterica serovar Enteritidis str. EC20120005
Salmonella enterica subsp. enterica serovar Enteritidis str. EC20110361
Salmonella enterica subsp. enterica serovar Enteritidis str. EC20110353
Azospirillum brasilense strain Az39
Brucella canis strain SVA13
Acinetobacter baumannii strain AC29
Bacillus methanolicus MGA3
Neorhizobium galegae
Enterococcus faecium T110
Flavobacterium psychrophilum strain CSF259-93
Acinetobacter baumannii strain AB030
Burkholderia mallei strain FMH 23344
Burkholderia cepacia strain DDS 7H-2
Paenibacillus sp. FSL H7-0737
Burkholderia cenocepacia strain DWS 0.37
Streptococcus pyogenes strain 7F7
Bacillus cereus strain 03BB87
Hymenobacter sp. DG25B
Staphylococcus aureus strain 33b
Francisella guangzhouensis strain 08HL01032
Staphylococcus hyicus strain ATCC 11249
Aeromonas hydrophila J-1
Acinetobacter baumannii NCGM 237
Bacillus coagulans DSM 1 = ATCC 7050
Mycoplasma capricolum subsp. capripneumoniae 87001
Xanthomonas citri subsp. citri strain MN10
Xanthomonas citri subsp. citri strain AW14
Mannheimia haemolytica strain 89010807N lktA-
Mycoplasma gallinaceum strain B2096 8B
Oleispira antarctica strain RB-8
Xenorhabdus poinarii str. G6
Staphylococcus aureus strain FCFHV36
Thermotoga maritima strain Tma200
Streptococcus agalactiae strain SS1
Xanthomonas oryzae pv. oryzicola strain CFBP2286
Chlamydia trachomatis D/CS637/11
Planococcus sp. L10.15
Pseudomonadaceae bacterium C6819
Mycobacterium bovis BCG strain Russia 368
Streptococcus mitis strain KCOM 1350 (= ChDC B183)
Bifidobacterium breve strain BR3
Synechocystis sp. PCC 6803 substrain GT-G
Burkholderia cepacia ATCC 25416 chromosome 1
Campylobacter jejuni strain CJM1cam
Corynebacterium pseudotuberculosis strain 1002B
Rickettsia rhipicephali strain HJ#5
Bacillus amyloliquefaciens strain MBE1283
Salmonella enterica subsp. enterica serovar Enteritidis strain CMCC50041
Campylobacter jejuni strain CJ677CC532
Bacillus cereus strain FORC_013
Klebsiella oxytoca DNA complete genome strain: JKo3
Streptococcus mutans strain NG8
Achromobacter xylosoxidans strain FDAARGOS_162
Alteromonas sp. Mac1
Rhodobacter sphaeroides strain MBTLJ-8
Rickettsia prowazekii strain Naples-1
Bordetella pertussis strain E476
Bordetella pertussis strain I480
Bordetella pertussis strain I669
Mycobacterium abscessus strain FLAC003
Streptomyces ambofaciens strain DSM 40697
Azospirillum humicireducens strain SgZ-5
Rhizobium phaseoli strain R611
Lactobacillus plantarum strain NCU116
Bacillus anthracis strain Parent1
Bacillus anthracis strain PR01
Serinicoccus sp. JLT9
Vibrio scophthalmi strain VS-12
Enterococcus faecalis strain KB1
Candidatus Tremblaya princeps isolate TPMHIR1
Plesiomonas shigelloides strain NCTC10360
Atribacteria bacterium SCGC AAA255-E04
Aerophobetes bacterium JGI 0000014-C22
Candidate division TM6 bacterium GW2011_GWF2_30_66 UR12_C0001
Candidate division WS6 bacterium GW2011_GWC1_36_11 UR96_C0001
Berkelbacteria bacterium GW2011_GWA1_36_9 US31_C0001
Candidatus Falkowbacteria bacterium GW2011_GWC2_38_22 US83_C0001
Candidatus Curtissbacteria bacterium GW2011_GWC2_38_9 UT12_C0001
Candidate division WS6 bacterium GW2011_GWF2_39_15 UT34_C0001
Candidatus Daviesbacteria bacterium GW2011_GWA2_39_33 UT45_C0001
Candidate division CPR2 bacterium GW2011_GWD1_39_7 UT59_C0001
Candidatus Levybacteria bacterium GW2011_GWB1_41_21 UU52_C0001
Candidatus Giovannonibacteria bacterium GW2011_GWC2_44_9 UW81_C0001
Candidatus Gottesmanbacteria bacterium GW2011_GWA2_42_18 UV09_C0001
Candidatus Magasanikbacteria bacterium GW2011_GWC2_42_27 UV18_C0001
Candidate division WWE3 bacterium GW2011_GWA1_43_94 UW13_C0001
Candidatus Collierbacteria bacterium GW2011_GWA1_45_15 UW96_C0001
Candidatus Azambacteria bacterium GW2011_GWD2_46_48 UX56_C0001
Candidate division Kazan bacterium GW2011_GWC1_52_13 VE99_C0001
Candidate division WOR_3 bacterium SM1_77 WOR1_30_36_10180
Candidate division WOR-1 bacterium RIFOXYB2_FULL_36_35
Candidate division CPR3 bacterium RIFOXYB2_FULL_35_8
Candidate division WWE3 bacterium RIFCSPHIGHO2_12_FULL_38_15
Candidate division WWE3 bacterium RIFOXYB1_FULL_42_27
Candidatus Abawacabacteria bacterium RBG_16_42_10
Candidatus Amesbacteria bacterium RIFCSPLOWO2_01_FULL_48_50
Candidatus Adlerbacteria bacterium RIFCSPHIGHO2_12_FULL_53_18
Candidatus Beckwithbacteria bacterium RIFCSPHIGHO2_02_FULL_49_13
Candidatus Daviesbacteria bacterium RIFCSPLOWO2_01_FULL_39_23
Candidatus Doudnabacteria bacterium RIFCSPHIGHO2_01_FULL_46_24
Candidatus Firestonebacteria bacterium GWA2_43_8
Candidatus Glassbacteria bacterium RBG_16_58_8
Candidatus Blackburnbacteria bacterium RIFCSPHIGHO2_02_FULL_44_20
Candidatus Chisholmbacteria bacterium RIFCSPHIGHO2_01_FULL_48_12
Candidatus Brennerbacteria bacterium RIFOXYD1_FULL_41_16
Candidatus Buchananbacteria bacterium RIFCSPHIGHO2_02_FULL_56_16
Candidatus Colwellbacteria bacterium RIFCSPHIGHO2_02_FULL_45_17
Candidatus Harrisonbacteria bacterium RIFCSPLOWO2_01_FULL_40_28
Candidatus Komeilibacteria bacterium RIFOXYD1_FULL_37_29
Candidatus Liptonbacteria bacterium RIFCSPHIGHO2_01_FULL_57_28
Candidatus Komeilibacteria bacterium RIFOXYD2_FULL_37_8
Methanococcus voltae A3
Methanosarcina mazei strain Goe1
Methanopyrus kandleri AV19
Pyrobaculum aerophilum str. IM2
Nanoarchaeum equitans Kin4-M
Picrophilus torridus DSM 9790
Methanothermobacter thermautotrophicus str. Delta H
Archaeoglobus fulgidus DSM 4304
Methanocella paludicola SANAE DNA
Pyrococcus horikoshii OT3 DNA
Aeropyrum pernix K1 DNA
Methanococcus maripaludis strain S2
Methanococcoides burtonii DSM 6242
Hyperthermus butylicus DSM 5456
Thermofilum pendens Hrk 5
Methanocorpusculum labreanum Z
Methanoculleus marisnigri JR1
Methanococcus maripaludis C5
Pyrobaculum arsenaticum DSM 13514
Methanobrevibacter smithii ATCC 35061
Methanococcus vannielii SB
Methanococcus aeolicus Nankai-3
Methanococcus maripaludis C7
Candidatus Methanoregula boonei 6A8
Ignicoccus hospitalis KIN4/I
Caldivirga maquilingensis IC-167
Thermococcus onnurineus NA1
Nitrosopumilus maritimus SCM1
Methanococcus maripaludis C6
Candidatus Korarchaeum cryptofilum OPF8
Thermoproteus neutrophilus V24Sta
Desulfurococcus kamchatkensis 1221n
Candidatus Methanosphaerula palustris E1-9c
Halorubrum lacusprofundi ATCC 49239
Thermococcus gammatolerans EJ3
Sulfolobus islandicus L.S.2.15
Sulfolobus islandicus M.14.25
Sulfolobus islandicus Y.G.57.14
Sulfolobus islandicus Y.N.15.51
Halomicrobium mukohataei DSM 12286
Methanobrevibacter ruminantium M1
Methanocaldococcus vulcanius M7
Haloterrigena turkmenica DSM 5511
Natrialba magadii ATCC 43099
Methanohalophilus mahii DSM 5219
Natronomonas pharaonis DSM 2160
Methanocella arvoryzae MRE50
Halobacterium salinarum R1
Methanothermobacter marburgensis str. Marburg
Methanoplanus petrolearius DSM 11571
Thermococcus barophilus MP
Thermococcus sp. AM4
Ferroplasma acidarmanus fer1
Methanothermus fervidus DSM 2088
Halogeometricum borinquense DSM 11551
Methanothermococcus okinawensis IH1
Desulfurococcus mucosus DSM 2162
Sulfolobus islandicus HVE10/4
Vulcanisaeta moutnovskia 768-28
Thermoproteus uzoniensis 768-20
Archaeoglobus veneficus SNP6
Methanosarcina barkeri str. Fusaro
Methanohalobium evestigatum Z-7303
Methanosaeta concilii GP-6
Metallosphaera cuprina Ar-4
Acidianus hospitalis W1
Methanobacterium paludis strain SWAN1
Halopiger xanaduensis SH-6
Thermococcus sp. 4557
Pyrolobus fumarii 1A
Haloarcula hispanica ATCC 33960
Halophilic archaeon DL31
Natronobacterium gregoryi SP2
Natrinema pellirubrum DSM 15624
Halobacterium sp. DL1
Pyrobaculum ferrireducens strain 1860
Methanosaeta harundinacea 6Ac
Haloquadratum walsbyi C23
Thermogladius cellulolyticus 1633
Pyrococcus furiosus COM1
Natrinema sp. J7-2
Candidatus Nitrosopumilus koreensis AR1
Candidatus Nitrosopumilus sp. AR2
Candidatus Methanomethylophilus alvus Mx1201
Methanoculleus bourgensis MS2T
Methanolobus psychrophilus R15
Haloferax mediterranei ATCC 33500
Caldisphaera lagunensis DSM 15908
Methanoregula formicicum SMSP
Aciduliprofundum sp. MAR08-339
Halovivax ruber XH-70
Methanomethylovorans hollandica DSM 15978
Natronococcus occultus SP4
Sulfolobus acidocaldarius N8
Methanosarcina mazei Tuc01
Thermoplasmatales archaeon BRNA1
Archaeoglobus sulfaticallidus PM70-1
Salinarchaeum sp. Harcht-Bsk1
Candidatus Methanomassiliicoccus intestinalis Issoire-Mx1
Halorhabdus tiamatea SARL4B
Haloarcula hispanica N601
Sulfolobus acidocaldarius SUSAZ
Thermococcus sp. ES1
Thermococcus nautili strain 30-1
Aeropyrum camini SY1 = JCM 12091
Natronomonas moolapensis 8.8.11
Nitrososphaera viennensis EN76
Candidatus Nitrososphaera evergladensis SR1
Archaeoglobus fulgidus DSM 8774
Methanocaldococcus bathoardescens strain JH146
Methanobacterium formicicum strain BRM9
Thermococcus eurythermalis strain A501
Geoglobus acetivorans strain SBH6
Candidatus Methanoplasma termitum strain MpT1
Candidatus Nitrosopelagicus brevis strain CN25
Thermofilum carboxyditrophus 1505
Thermococcus guaymasensis DSM 11113
Haloarcula sp. CBA1115
Archaeon GW2011_AR10
Methanobacterium formicicum genome assembly DSM1535
Sulfolobus solfataricus strain SULB
Sulfolobus solfataricus strain SULC
Sulfolobus solfataricus strain SULA
Methanosarcina sp. WWM596
Methanosarcina barkeri str. Wiesmoor
Methanosarcina siciliae T4/M
Methanosarcina siciliae HI350
Methanosarcina mazei WWM610
Methanosarcina mazei LYC
Methanosarcina mazei C16
Methanosarcina lacustris Z-7289
Methanosarcina horonobensis HB-1
Methanosarcina barkeri 3
Methanococcoides methylutens MM1
Thermofilum sp. 1807-2
Geoglobus ahangari strain 234
Halanaeroarchaeum sulfurireducens strain HSR2
Pyrobaculum sp. WP30
Haloferax gibbonsii strain ARA6
Metallosphaera sedula strain ARS50-1
Metallosphaera sedula strain ARS120-2
Metallosphaera sedula strain SARC-M1
Halanaeroarchaeum sulfurireducens strain M27-SA2
Pyrodictium delaneyi strain Su06
Thermococcus barophilus strain CH5
Methanobrevibacter millerae strain SM9
Ignicoccus islandicus DSM 13165
Thermococcus sp. 2319x1
Halobacterium hubeiense genome assembly Halobacterium hubeiense JI20-1
Nanoarchaeota archaeon 7A
Methanogenic archaeon ISO4-H5
Methanobrevibacter olleyae strain YLM1
Pyrococcus sp. NCB100
Thermococcus sp. CDGS
Methanoculleus sp. MAB1 isolate Methanoculleus sp MAB1
Sulfolobus solfataricus strain P1
Aigarchaeota archaeon SCGC AAA471-E14
Aigarchaeota archaeon SCGC AAA471-B22
Aigarchaeota archaeon JGI 0000001-A7
Aigarchaeota archaeon JGI 0000106-J15
Aigarchaeota archaeon SCGC AAA471-E14
Aigarchaeota archaeon SCGC AAA471-E14
Aigarchaeota archaeon SCGC AAA471-F17
Candidatus Thorarchaeota archaeon SMTZ1-83
Crenarchaeota archaeon SCGC AAA471-B05
Crenarchaeota archaeon SCGC AAA471-L14
Thermoplasmatales archaeon DG-70 15865
Acidilobus saccharovorans 345-15
Caldisphaera lagunensis DSM 15908
Desulfurococcus fermentans DSM 16532
Ignisphaera aggregans DSM 17230
Staphylothermus hellenicus DSM 12710
Thermogladius cellulolyticus 1633
Thermosphaera aggregans DSM 11486
Hyperthermus butylicus DSM 5456
Pyrolobus fumarii 1A
Fervidicoccus fontis Kam940
Acidianus hospitalis W1
Metallosphaera sedula DSM 5348
Sulfolobus solfataricus P2
Sulfolobales archaeon Acd1
Sulfolobales archaeon AZ1 isolate
Thermofilum pendens Hrk 5456
Caldivirga maquilingensis IC-167
Thermoproteus neutrophilus V24Sta
Thermoproteus uzoniensis 768-20
Hadesarchaea archaeon YNP_45
Hadesarchaea archaeon YNP_N21
Candidatus Korarchaeum cryptofilum OPF8
Nanoarchaeota archaeon SCGC AAA011-G17
Nanoarchaeota archaeon SCGC AAA011-L15
Nanoarchaeum equitans Kin4-M
Candidatus Haloredivivus sp. G17
Candidatus Nanosalinarum sp. J07AB56
Candidatus Micrarchaeum acidiphilum ARMAN-2
Candidatus Parvarchaeum acidophilus ARMAN-5_'5-way FS'
Cenarchaeum symbiosum A
Thaumarchaeota archaeon CSP1-1
Thaumarchaeota archaeon SCGC AB-539-E09
Candidatus Nitrosopumilus koreensis AR1
Candidatus Nitrosoarchaeum koreensis MY1 MY1
Nitrososphaera viennensis EN76
Thaumarchaeota archaeon RBG_16_49_8
Thaumarchaeota archaeon MY2 NKMY2_1
Thaumarchaeota archaeon SCGC AAA282-K18
Thaumarchaeota archaeon SCGC AB-179-E04
Candidatus Caldiarchaeum subterraneum DNA
Marine Group I thaumarchaeote SCGC AB-629-I23
Marine Group III euryarchaeote SCGC AAA288-E19
Thaumarchaeota archaeon SCGC AAA007-O23
Marine Group I thaumarchaeote SCGC AAA799-E16
Marine Group I thaumarchaeote SCGC AAA799-N04
Candidatus Micrarchaeota archaeon RBG_16_49_10
Candidate division WOR_3 bacterium SM1_77
Candidatus Micrarchaeota archaeon RBG_16_36_9
Candidatus Nanopusillus sp. Nst1
halophilic archaeon J07HX64
Vulcanisaeta distributa DSM 14429
Plasmodium vivax
Trypanosoma brucei
Leishmania major strain Friedlin
Plasmodium falciparum strain 3D7
Eimeria tenella
Leishmania braziliensis MHOM/BR/75/M2904
Leishmania infantum JPCM5
Theileria annulata strain Ankara
Dictyostelium discoideum
Plasmodium knowlesi strain H
Toxoplasma gondii ME49 chromosome Ia
Thalassiosira pseudonana CCMP1335
Phaeodactylum tricornutum CCAP 1055/1
Cryptosporidium parvum Iowa II
Theileria parva strain Muguga
Neospora caninum Liverpool
Cryptosporidium parvum
Trypanosoma brucei gambiense DAL972
Leishmania donovani BPK282A1
Leishmania mexicana MHOM/GT/2001/U1103
Ectocarpus siliculosus strain Ec 32
Plasmodium cynomolgi strain B
Crithidia fasciculata strain Cf-Cl
Babesia equi strain WA
Leishmania sp. MAR LEM2494
Leishmania donovani strain BHU 1220
Nannochloropsis gaditana strain B-31
Babesia microti strain RI
Plasmodium coatneyi strain Hackeri
Theileria orientalis strain Shintoku
Leishmania panamensis strain MHOM/PA/94/PSC-1
Babesia bigemina genome assembly Bbig001
Leishmania peruviana PAB-4377_V1
Plasmodium reichenowi strain SY57
Plasmodium gaboni strain SY75
Plasmodium berghei
Plasmodium yoelii
Laurentiella
Helicosporidium sp. ATCC 50920
Coccomyxa sp. LA000219
Trebouxia gelatinosa isolate LA000220
Chlorella pyrenoidosa strain FACHB-9
Chlamydomonas applanata
Yarrowia lipolytica CLIB122
Schizosaccharomyces pombe
Zygosaccharomyces rouxii strain CBS732
Candida dubliniensis CD36
Encephalitozoon intestinalis
Saccharomyces kluyveri NRRL Y-12651
Aspergillus oryzae RIB40
Mycosphaerella graminicola IPO323
Myceliophthora thermophila ATCC 42464
Thielavia terrestris NRRL 8126
Eremothecium cymbalariae
Fusarium graminearum PH-1
Encephalitozoon hellem
Kazachstania africana CBS 2517
Saccharomyces cerevisiae R103
Valsa mali strain 03-8
Saccharomyces cerevisiae YJM1443
Saccharomyces cerevisiae YJM1447
Saccharomyces cerevisiae YJM1477
Saccharomyces cerevisiae YJM1479
Saccharomyces cerevisiae YJM1615
Sporisorium scitamineum strain SSC39
Torulaspora delbrueckii strain NRRL Y-50541
Eremothecium sinecaudum strain ATCC 58844
Kluyveromyces marxianus isolate B0399
Oikopleura dioica
Lottia gigantea
Do you happen to have taxon identifiers for the training data in this file?
https://genome.cshlp.org/content/suppl/2018/03/22/gr.228429.117.DC1/Supplemental_Table_S1.xlsx
It's currently difficult to know which sequences are associated with these.
Here's the list but it's difficult to search the names. Did you you download them from NCBI or another database?
Organisms