3/Apr/26: Discussion - connecting omi to somatem; next steps for somatem #105

ppreshant · 2026-04-03T19:00:37Z

ppreshant
Apr 3, 2026
Maintainer

Organized summary and takeaways

Analysis types: change to these

focus on these types; assembly of mags by itself is not an "analysis type" per se ; so need to re-think how we include that

Species detection: with ensemble of 3 tools: Kraken2, Ganon2, Sylph)
- consensus: using 2/3 tools for species calls and use parent species presence for strain calls
Taxonomic profiling: Lemur: prokaryotic? / sylph for speed ~ Prok ; viral ; fungal)
SV detection and HGT: kente for single sample ; rhea for a longitudinal series (time, treatment/other variable)
Functional profiling: Seqscreen : both reads and contigs ;
- Bakta : annotating contigs/assemblies ; if we can call this "functional" profiling?
Ambiguous/other
- mag assembly: how to include this in the paper? ; might be better plugged into other steps ; but rhea's flye and Seqscreen's minimap steps already exist within the executables..
- Future: Greyed out? / Strainify :

Next steps

(timeline) Completion of a comprehensive paper draft by the end of the month (Apr 30th)

Todd: on April 30th, this only leaves 3 weeks to wrap up a ton of stuff. i’m confident we can get there but if you get stuck on anything reach out to let me know.
- Get feedback from Todd on the outline of the paper
- Prashant and Dongwei: Collaborate on paper draft, with Dongwei focusing on omi section
- somatem: focus on the unique features of the pipeline (rhea, kente, ensemble) ; emphasis on confidence level and ensemble features (seqscreen: configs + reads ; species identification: 3 tools)
  - focusing on configuration updates and analysis types
Paper venue: lets try for NBT as a aspirational goal, if they say no, we will then go to Ncomms
(Updates): Weekly updates on writing, somatem and omi updates/status and next tasks
(Results): Ensemble: Get mimic generated synthetic dataset from Eddy with pacbio reads. Can just show an improved F1 score for species identification vs any single tool
- Also use some real data: ONT reads vs contigs : ?

Important

~~Austin~~ -> PK: Complete the ensemble classification sub-workflow by integrating Ganon 2, Sylph, and Lemur
Felix: Implement functionality to send configuration file + 16S data to server on owlet03 (initial implementation without job tracking ; only for Rice logins?)
- Start simple:
  1. Dongwei/omi: first make a detailed config file with all params : one can copy this file and use to run manually
  2. Felix/connection: single run node/server = owlet03 - no load balancing
    - This is useful for the paper reviewers to interact with omi-somatem by the 16S pipeline run on our servers / on google cloud if Todd decides to pay for it.
  3. Felix/connection: generalize to run on user's own infrastructure (while keeping omi localized to us!)
- This is a precursor to the second implementation mode where we let users connect to their own servers to run somatem: *listener should be easy to deploy + just changing the ip-address should work? to communicate to omi
- Also could add a google cloud option (in a running mode field) for users' to bring their own API key -- PK: the orchestrator/head should not be on Gcloud ideally.. only the execution should happen there
  - PK to think about this
Dongwei: Update omi parameters tab to allow user configuration of tool parameters : work with Prashant
Dongwei: Update confidence level settings/message on omi to be analysis type-specific with inputs from PK:
- seqscreen/functional profiling:
  - Regular / fast mode: users only reads (saves time of making contigs) : seqscreen has functionality to integrate both outputs already
  - high confidence uses both reads and contigs
- Species identification:
  - Regular / fast mode : uses only sylph
  - High confidence uses the 3 tools in ensemble
- What else?

Minor

PK's notes:

AI generated summary by Zoom ; Extracted and highlighted key stuff from this above..

Quick recap

The meeting focused on planning the development and documentation of the Somatem pipeline and its companion tool OMI. Todd emphasized prioritizing the completion of a comprehensive paper draft by the end of the month, outlining four key analysis types: species detection, taxonomic profiling, HGT detection, and functional characterization. The team discussed technical improvements including adding a third taxonomic classifier (Ganon2), updating database configurations (q #qn what was this?), and implementing ensemble classification methods. Prashant was tasked with leading the paper writing effort, while Austin would work on ensemble implementation and Felix would handle backend configuration for local file transfers to Rice servers. The group also discussed updating analysis types, confidence levels, and sample status categories in the pipeline interface to better reflect the tool's capabilities and distinguish it from other pipelines in the field.

Next steps

Archived..

Other notes:
2. Prashant: Update the sylph database functionality (?)
3. Team: Schedule and attend follow-up meeting on Tuesday at 11 AM (Only Austin and me)
4. Prashant: Create a branch in Somatem and merge new modules within a couple hours
5. Austin: Begin work on ensemble classification functionality on Monday
6. (redundant) ~~Dongwei: Provide screenshots and useful prompts documentation for OMI section of paper
7. Prashant: Send Zoom link for Tuesday's meeting

Summary

Genome Sampling and Research Protocol

Austin discussed running 250 samples through assembly, resulting in thousands of genomes that may be relevant to their NFL-funded research involving Rice athlete samples. They noted that host DNA typically comprises only about 5-10% of gut samples, though this can vary depending on the sample type. The conversation also touched on IRB requirements regarding host DNA collection, though this wasn't explicitly addressed in their protocols.
Cloud Migration to Orion Discussion

Prashant and Austin discussed moving their system from Google Cloud to Orion, exploring whether Docker would still be needed. They determined that Docker might not be necessary since they can handle ports locally, and load balancing would need to be managed through Orion's VM capabilities rather than automatically as it was on Google Cloud. Prashant mentioned potential challenges with IT permissions and firewalls, and considered reaching out to IT office hours for assistance. The discussion concluded with a suggestion to run the load balancer on two outlets, potentially allowing two users maximum to run jobs simultaneously.

Database Management for Pipeline Implementation

The team discussed database handling for their pipeline implementation. Prashant and Austin clarified how databases would be managed, with Austin explaining that databases like EMU are pre-installed on GitHub and can be cloned into the Conda environment's share folder. Todd clarified that databases should be available locally once the pipeline is installed, eliminating the need for on-demand downloads, which was presented as an advantage over their previous Google Cloud approach.

Omi Configure Launch Parameters Focus

Todd emphasized focusing on making omi configure launch parameters for Somatim rather than spending time on complex technical connections between the two systems. He advised against over-engineering the integration, suggesting that manual file copying might be acceptable initially, and recommended prioritizing functionality that allows users to set up their own local environment. Todd also stressed the importance of being realistic about timelines given team members' departures and the need to prepare for paper submission.

Client Application Configuration Discussion

Todd and Prashant discussed the configuration and implementation of a client application (omi) that would help users navigate running certain tools, without fully connecting to their servers due to concerns about file sizes, session management, and potential security issues. Todd suggested that if the lab decides to support this functionality, they could simply reactivate Google Cloud services rather than building additional infrastructure. They agreed to focus first on configuring and sending parameter files to users, with manual launching as an initial approach, while acknowledging that there are still technical issues to resolve regarding parameter support and run mode modifications.

Somaten Software Implementation Planning

The team discussed implementing and configuring the Somaten software with a focus on getting it ready for review within a month. Todd emphasized keeping the initial setup simple and prioritizing ensemble classification and horizontal gene transfer features to make the tool more attractive to users and reviewers. Prashant and Austin were tasked with adding a third classifier to join Lemur and Syl, with Ganon2 being a potential candidate, while also working on a draft paper outline focusing on the Somaten Copilot and omi integration. The team debated database options for storing chat history, with Todd advocating for a NoSQL approach over SQL due to its simplicity and future scalability needs.

Taxonomic Classifier Approach Refinement

Todd discussed the need to refine the paper's approach to taxonomic classifiers and analysis types. He suggested adding a third category for sample status (single, cross-sectional, and longitudinal) and proposed updating the analysis types to focus on specific outputs rather than assembly methods. Todd recommended changing the analysis types to include HGT detection, species detection, taxonomic profiling, and functional characterization, with confidence levels tailored to each analysis type. He also suggested adding technology qualifiers for sequencing, including specific versions of R10, and incorporating a Google Cloud field for processing options.

Paper Restructuring and Tool Implementation

The team discussed restructuring a paper into four sub-areas: OMI Copilot, taxonomic profiling, species detection, and HET detection with functional characterization. Todd recommended using SeekScreen for functional profiling, emphasizing that the tool should be run and output generated without the need for integration. The team also discussed implementing a parameters system where users can select and configure tools, with Todd offering to help provide the necessary parameters documentation. Todd highlighted the innovative aspect of ensemble classification with confidence level modifiers and assigned specific tasks to team members, including Prashant handling most changes, Dongwei focusing on database updates, and Felix implementing file submission functionality for rice.edu users.

File Transfer Feature Implementation Planning

Todd and Prashant discussed implementing a file transfer feature as the first step, with plans to later add a launch button for Rice.edu users. Todd emphasized the importance of structuring a paper template around analysis types, sequencing technology, and data sets including mimic-generated and clinical samples. Dongwei was tasked with creating screenshots demonstrating different prompts and highlighting features like co-pilot functionality, GPT model selection, and history saving capabilities, while emphasizing the need for guardrails in their implementation.

Paper Draft Progress Planning Meeting

The team discussed progress on a paper draft and various technical tasks. Todd assigned Prashant to focus on completing a complete V1 draft of the paper by the end of the month, with a time budget of 2-3 days. Austin was tasked with ensemble work, while Felix was allocated 3 days maximum for backend tasks before needing to focus on other priorities. Prashant agreed to create a Ganon module and address database download issues, while Austin planned to work on sub-workflows. They scheduled a follow-up meeting for Tuesday at 11 AM to continue their collaboration.

The short 4 page application note format discussed previously may not be enough to get all the details about somatem and omi in there.. Let's go long~!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3/Apr/26: Discussion - connecting omi to somatem; next steps for somatem #105

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

3/Apr/26: Discussion - connecting omi to somatem; next steps for somatem #105

Uh oh!

Uh oh!

ppreshant Apr 3, 2026 Maintainer

Organized summary and takeaways

Analysis types: change to these

Next steps

Important

Minor

Quick recap

Next steps

Summary

Genome Sampling and Research Protocol

Database Management for Pipeline Implementation

Omi Configure Launch Parameters Focus

Client Application Configuration Discussion

Somaten Software Implementation Planning

Taxonomic Classifier Approach Refinement

Paper Restructuring and Tool Implementation

File Transfer Feature Implementation Planning

Paper Draft Progress Planning Meeting

Replies: 0 comments

ppreshant
Apr 3, 2026
Maintainer