Feature/phy by pool by nickp60 · Pull Request #9 · vdblab/vdbR

nickp60 · 2025-09-23T13:27:53Z

This adds an argument to vdb_make_phylo to create analyses by "oligos_id", the combination of sampleid and pool. This allows us to analyze samples sequenced multiple times.

…e id

miraep8

Happy for this to be added as an argument/agree it would be nice to allow for all results to be returned.

Changes requested are related to perceived issues in the edge case that sampleid_col != sampleid. Also just noticed something off with the docs for skip_seq unrelated to your change but maybe you could bundle it in?

miraep8 · 2025-09-23T19:27:29Z

@@ -120,58 +120,65 @@ padMRN <- base::Vectorize(USE.NAMES = FALSE, function(mrn) {
 #' @param metadata dataframe with at least one column sampleids; everything else is treated as sample info
 #' @param sampleid_col name of the column containing sampleids
 #' @param skip_seqs if true, sequences from asv_sequences_ag table will be included in phyloseq object.  Not enabled by default


I think this should be if false then the sequences will be added 🤔 (for skip_seqs) sorry just noticing this from proximity

miraep8 · 2025-09-23T19:28:25Z


 \item{sampleid_col}{name of the column containing sampleids}

 \item{skip_seqs}{if true, sequences from asv_sequences_ag table will be included in phyloseq object.  Not enabled by default}


Also think that if false here. (for skip_seqs)

miraep8 · 2025-09-23T19:35:18Z

-    taxa_are_rows = FALSE
-  )
+  if (by_oligo_id) sampleid_col <- "oligos_id"
+  tab <- counts %>% dplyr::select(asv_key, all_of(sampleid_col), count) %>% 


Hmmm, I think that there might be an issue with trying to use sampleid_col to subset from counts. In the case that by_oligo_id = F then sampleid_col will be whatever the user put in. ie (my_special_sample_name) but this isn't a valid column name in asv_counts_ag which is where counts gets its colnames from.

miraep8 · 2025-09-23T19:36:47Z

+  if (by_oligo_id){
+    samp <- phyloseq::sample_data(
+      metadata %>% 
+        dplyr::left_join(counts %>% dplyr::select(sampleid, oligos_id) %>% dplyr::distinct()) %>% tibble::column_to_rownames(sampleid_col)


kind of similar to issue above, I think this assumes that metadata has a column called sampleid in order to work.

miraep8 · 2025-09-23T19:37:40Z

+
+test_that("phy by oligo_id builds correctly", {
+  connect_database(bundled = TRUE)
+  tmp <- data.frame("sampleid" = c("1143N", "notarealsample"), group=c("A", "A"))


Maybe to check if the sampleid_col being called sampleid versus something else could try a test where the sample column has a different name?

nickp60 · 2025-09-24T16:06:36Z

Thanks @miraep8 good catches! I updated the functions upstream so that process_metadata handles the sampleid_col stuff, and we don't have to use non-standard evaluation for the rest of the functions.

miraep8

This looks like it should work to me/I think sticking the fix-it logic in process_metadata is a clean solution. Thanks! 🏛️

nickp60 added 2 commits September 19, 2025 15:08

add arg for creating phyloseq by oligo ID (pool id) rather than sampl…

296247b

…e id

fix weird issue with df casting

e13de70

nickp60 requested review from Anqi-Dai and miraep8 September 23, 2025 13:27

nickp60 added 2 commits September 23, 2025 09:48

Adjust test to samples included in test db

0f9231f

fix test

17b06c1

nickp60 force-pushed the feature/phy-by-pool branch from 39382d1 to 17b06c1 Compare September 23, 2025 13:52

Update docs

4835bd5

miraep8 requested changes Sep 23, 2025

View reviewed changes

simplify sampleid_col logic across functions

7db7db0

nickp60 requested a review from miraep8 September 24, 2025 16:05

miraep8 approved these changes Sep 24, 2025

View reviewed changes

Increment version number to 0.14.0

b73c5f2

nickp60 merged commit 2bfb2d8 into main Sep 25, 2025
2 checks passed

nickp60 deleted the feature/phy-by-pool branch September 26, 2025 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/phy by pool#9

Feature/phy by pool#9
nickp60 merged 7 commits into
mainfrom
feature/phy-by-pool

nickp60 commented Sep 23, 2025

Uh oh!

miraep8 left a comment

Uh oh!

miraep8 Sep 23, 2025

Uh oh!

miraep8 Sep 23, 2025

Uh oh!

miraep8 Sep 23, 2025

Uh oh!

miraep8 Sep 23, 2025

Uh oh!

miraep8 Sep 23, 2025

Uh oh!

nickp60 commented Sep 24, 2025

Uh oh!

miraep8 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		\item{sampleid_col}{name of the column containing sampleids}

		\item{skip_seqs}{if true, sequences from asv_sequences_ag table will be included in phyloseq object. Not enabled by default}

Uh oh!

Conversation

nickp60 commented Sep 23, 2025

Uh oh!

miraep8 left a comment

Choose a reason for hiding this comment

Uh oh!

miraep8 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

miraep8 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

miraep8 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

miraep8 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

miraep8 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

nickp60 commented Sep 24, 2025

Uh oh!

miraep8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants