Skip to content

Feature/phy by pool#9

Merged
nickp60 merged 7 commits into
mainfrom
feature/phy-by-pool
Sep 25, 2025
Merged

Feature/phy by pool#9
nickp60 merged 7 commits into
mainfrom
feature/phy-by-pool

Conversation

@nickp60

@nickp60 nickp60 commented Sep 23, 2025

Copy link
Copy Markdown
Contributor

This adds an argument to vdb_make_phylo to create analyses by "oligos_id", the combination of sampleid and pool. This allows us to analyze samples sequenced multiple times.

@nickp60 nickp60 requested review from Anqi-Dai and miraep8 September 23, 2025 13:27
@nickp60 nickp60 force-pushed the feature/phy-by-pool branch from 39382d1 to 17b06c1 Compare September 23, 2025 13:52

@miraep8 miraep8 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for this to be added as an argument/agree it would be nice to allow for all results to be returned.

Changes requested are related to perceived issues in the edge case that sampleid_col != sampleid. Also just noticed something off with the docs for skip_seq unrelated to your change but maybe you could bundle it in?

Comment thread R/utilities.R Outdated
@@ -120,58 +120,65 @@ padMRN <- base::Vectorize(USE.NAMES = FALSE, function(mrn) {
#' @param metadata dataframe with at least one column sampleids; everything else is treated as sample info
#' @param sampleid_col name of the column containing sampleids
#' @param skip_seqs if true, sequences from asv_sequences_ag table will be included in phyloseq object. Not enabled by default

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be if false then the sequences will be added 🤔 (for skip_seqs) sorry just noticing this from proximity

Comment thread man/vdb_make_phylo.Rd Outdated

\item{sampleid_col}{name of the column containing sampleids}

\item{skip_seqs}{if true, sequences from asv_sequences_ag table will be included in phyloseq object. Not enabled by default}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also think that if false here. (for skip_seqs)

Comment thread R/utilities.R Outdated
taxa_are_rows = FALSE
)
if (by_oligo_id) sampleid_col <- "oligos_id"
tab <- counts %>% dplyr::select(asv_key, all_of(sampleid_col), count) %>%

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think that there might be an issue with trying to use sampleid_col to subset from counts. In the case that by_oligo_id = F then sampleid_col will be whatever the user put in. ie (my_special_sample_name) but this isn't a valid column name in asv_counts_ag which is where counts gets its colnames from.

Comment thread R/utilities.R Outdated
if (by_oligo_id){
samp <- phyloseq::sample_data(
metadata %>%
dplyr::left_join(counts %>% dplyr::select(sampleid, oligos_id) %>% dplyr::distinct()) %>% tibble::column_to_rownames(sampleid_col)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of similar to issue above, I think this assumes that metadata has a column called sampleid in order to work.


test_that("phy by oligo_id builds correctly", {
connect_database(bundled = TRUE)
tmp <- data.frame("sampleid" = c("1143N", "notarealsample"), group=c("A", "A"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to check if the sampleid_col being called sampleid versus something else could try a test where the sample column has a different name?

@nickp60 nickp60 requested a review from miraep8 September 24, 2025 16:05
@nickp60

nickp60 commented Sep 24, 2025

Copy link
Copy Markdown
Contributor Author

Thanks @miraep8 good catches! I updated the functions upstream so that process_metadata handles the sampleid_col stuff, and we don't have to use non-standard evaluation for the rest of the functions.

@miraep8 miraep8 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should work to me/I think sticking the fix-it logic in process_metadata is a clean solution. Thanks! 🏛️

@nickp60 nickp60 merged commit 2bfb2d8 into main Sep 25, 2025
2 checks passed
@nickp60 nickp60 deleted the feature/phy-by-pool branch September 26, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants