Feature/phy by pool#9
Conversation
39382d1 to
17b06c1
Compare
miraep8
left a comment
There was a problem hiding this comment.
Happy for this to be added as an argument/agree it would be nice to allow for all results to be returned.
Changes requested are related to perceived issues in the edge case that sampleid_col != sampleid. Also just noticed something off with the docs for skip_seq unrelated to your change but maybe you could bundle it in?
| @@ -120,58 +120,65 @@ padMRN <- base::Vectorize(USE.NAMES = FALSE, function(mrn) { | |||
| #' @param metadata dataframe with at least one column sampleids; everything else is treated as sample info | |||
| #' @param sampleid_col name of the column containing sampleids | |||
| #' @param skip_seqs if true, sequences from asv_sequences_ag table will be included in phyloseq object. Not enabled by default | |||
There was a problem hiding this comment.
I think this should be if false then the sequences will be added 🤔 (for skip_seqs) sorry just noticing this from proximity
|
|
||
| \item{sampleid_col}{name of the column containing sampleids} | ||
|
|
||
| \item{skip_seqs}{if true, sequences from asv_sequences_ag table will be included in phyloseq object. Not enabled by default} |
There was a problem hiding this comment.
Also think that if false here. (for skip_seqs)
| taxa_are_rows = FALSE | ||
| ) | ||
| if (by_oligo_id) sampleid_col <- "oligos_id" | ||
| tab <- counts %>% dplyr::select(asv_key, all_of(sampleid_col), count) %>% |
There was a problem hiding this comment.
Hmmm, I think that there might be an issue with trying to use sampleid_col to subset from counts. In the case that by_oligo_id = F then sampleid_col will be whatever the user put in. ie (my_special_sample_name) but this isn't a valid column name in asv_counts_ag which is where counts gets its colnames from.
| if (by_oligo_id){ | ||
| samp <- phyloseq::sample_data( | ||
| metadata %>% | ||
| dplyr::left_join(counts %>% dplyr::select(sampleid, oligos_id) %>% dplyr::distinct()) %>% tibble::column_to_rownames(sampleid_col) |
There was a problem hiding this comment.
kind of similar to issue above, I think this assumes that metadata has a column called sampleid in order to work.
|
|
||
| test_that("phy by oligo_id builds correctly", { | ||
| connect_database(bundled = TRUE) | ||
| tmp <- data.frame("sampleid" = c("1143N", "notarealsample"), group=c("A", "A")) |
There was a problem hiding this comment.
Maybe to check if the sampleid_col being called sampleid versus something else could try a test where the sample column has a different name?
|
Thanks @miraep8 good catches! I updated the functions upstream so that |
miraep8
left a comment
There was a problem hiding this comment.
This looks like it should work to me/I think sticking the fix-it logic in process_metadata is a clean solution. Thanks! 🏛️
This adds an argument to vdb_make_phylo to create analyses by "oligos_id", the combination of sampleid and pool. This allows us to analyze samples sequenced multiple times.