3/Apr/26: Discussion - connecting omi to somatem; next steps for somatem #105
ppreshant
started this conversation in
Meeting notes
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Organized summary and takeaways
Analysis types: change to these
focus on these types; assembly of mags by itself is not an "analysis type" per se ; so need to re-think how we include that
Kraken2, Ganon2, Sylph)Lemur: prokaryotic? /sylphfor speed ~ Prok ; viral ; fungal)kentefor single sample ;rheafor a longitudinal series (time, treatment/other variable)Seqscreen: both reads and contigs ;Bakta: annotating contigs/assemblies ; if we can call this "functional" profiling?Strainify:Next steps
mimicgenerated synthetic dataset from Eddy with pacbio reads. Can just show an improved F1 score for species identification vs any single toolImportant
Austin-> PK: Complete the ensemble classification sub-workflow by integrating Ganon 2, Sylph, and Lemurowlet03(initial implementation without job tracking ; only for Rice logins?)owlet03- no load balancingrunning modefield) for users' to bring their own API key -- PK: the orchestrator/head should not be on Gcloud ideally.. only the execution should happen thereconfidence levelsettings/message on omi to be analysis type-specific with inputs from PK:Minor
kentemodule and get information and DB of pangenome graphs from NataliePK's notes:
AI generated summary by Zoom ; Extracted and highlighted key stuff from this above..
Quick recap
The meeting focused on planning the development and documentation of the Somatem pipeline and its companion tool OMI. Todd emphasized prioritizing the completion of a comprehensive paper draft by the end of the month, outlining four key analysis types: species detection, taxonomic profiling, HGT detection, and functional characterization. The team discussed technical improvements including adding a third taxonomic classifier (Ganon2), updating database configurations (q #qn what was this?), and implementing ensemble classification methods. Prashant was tasked with leading the paper writing effort, while Austin would work on ensemble implementation and Felix would handle backend configuration for local file transfers to Rice servers. The group also discussed updating analysis types, confidence levels, and sample status categories in the pipeline interface to better reflect the tool's capabilities and distinguish it from other pipelines in the field.
Next steps
Archived..
Other notes:
2. Prashant: Update the sylph database functionality (?)
3. Team: Schedule and attend follow-up meeting on Tuesday at 11 AM (Only Austin and me)
4. Prashant: Create a branch in Somatem and merge new modules within a couple hours
5. Austin: Begin work on ensemble classification functionality on Monday
6. (redundant) ~~Dongwei: Provide screenshots and useful prompts documentation for OMI section of paper
7. Prashant: Send Zoom link for Tuesday's meeting
Summary
Genome Sampling and Research Protocol
Austin discussed running 250 samples through assembly, resulting in thousands of genomes that may be relevant to their NFL-funded research involving Rice athlete samples. They noted that host DNA typically comprises only about 5-10% of gut samples, though this can vary depending on the sample type. The conversation also touched on IRB requirements regarding host DNA collection, though this wasn't explicitly addressed in their protocols.
Cloud Migration to Orion Discussion
Prashant and Austin discussed moving their system from Google Cloud to Orion, exploring whether Docker would still be needed. They determined that Docker might not be necessary since they can handle ports locally, and load balancing would need to be managed through Orion's VM capabilities rather than automatically as it was on Google Cloud. Prashant mentioned potential challenges with IT permissions and firewalls, and considered reaching out to IT office hours for assistance. The discussion concluded with a suggestion to run the load balancer on two outlets, potentially allowing two users maximum to run jobs simultaneously.
Database Management for Pipeline Implementation
The team discussed database handling for their pipeline implementation. Prashant and Austin clarified how databases would be managed, with Austin explaining that databases like EMU are pre-installed on GitHub and can be cloned into the Conda environment's share folder. Todd clarified that databases should be available locally once the pipeline is installed, eliminating the need for on-demand downloads, which was presented as an advantage over their previous Google Cloud approach.
Omi Configure Launch Parameters Focus
Todd emphasized focusing on making omi configure launch parameters for Somatim rather than spending time on complex technical connections between the two systems. He advised against over-engineering the integration, suggesting that manual file copying might be acceptable initially, and recommended prioritizing functionality that allows users to set up their own local environment. Todd also stressed the importance of being realistic about timelines given team members' departures and the need to prepare for paper submission.
Client Application Configuration Discussion
Todd and Prashant discussed the configuration and implementation of a client application (omi) that would help users navigate running certain tools, without fully connecting to their servers due to concerns about file sizes, session management, and potential security issues. Todd suggested that if the lab decides to support this functionality, they could simply reactivate Google Cloud services rather than building additional infrastructure. They agreed to focus first on configuring and sending parameter files to users, with manual launching as an initial approach, while acknowledging that there are still technical issues to resolve regarding parameter support and run mode modifications.
Somaten Software Implementation Planning
The team discussed implementing and configuring the Somaten software with a focus on getting it ready for review within a month. Todd emphasized keeping the initial setup simple and prioritizing ensemble classification and horizontal gene transfer features to make the tool more attractive to users and reviewers. Prashant and Austin were tasked with adding a third classifier to join Lemur and Syl, with Ganon2 being a potential candidate, while also working on a draft paper outline focusing on the Somaten Copilot and omi integration. The team debated database options for storing chat history, with Todd advocating for a NoSQL approach over SQL due to its simplicity and future scalability needs.
Taxonomic Classifier Approach Refinement
Todd discussed the need to refine the paper's approach to taxonomic classifiers and analysis types. He suggested adding a third category for sample status (single, cross-sectional, and longitudinal) and proposed updating the analysis types to focus on specific outputs rather than assembly methods. Todd recommended changing the analysis types to include HGT detection, species detection, taxonomic profiling, and functional characterization, with confidence levels tailored to each analysis type. He also suggested adding technology qualifiers for sequencing, including specific versions of R10, and incorporating a Google Cloud field for processing options.
Paper Restructuring and Tool Implementation
The team discussed restructuring a paper into four sub-areas: OMI Copilot, taxonomic profiling, species detection, and HET detection with functional characterization. Todd recommended using SeekScreen for functional profiling, emphasizing that the tool should be run and output generated without the need for integration. The team also discussed implementing a parameters system where users can select and configure tools, with Todd offering to help provide the necessary parameters documentation. Todd highlighted the innovative aspect of ensemble classification with confidence level modifiers and assigned specific tasks to team members, including Prashant handling most changes, Dongwei focusing on database updates, and Felix implementing file submission functionality for rice.edu users.
File Transfer Feature Implementation Planning
Todd and Prashant discussed implementing a file transfer feature as the first step, with plans to later add a launch button for Rice.edu users. Todd emphasized the importance of structuring a paper template around analysis types, sequencing technology, and data sets including mimic-generated and clinical samples. Dongwei was tasked with creating screenshots demonstrating different prompts and highlighting features like co-pilot functionality, GPT model selection, and history saving capabilities, while emphasizing the need for guardrails in their implementation.
Paper Draft Progress Planning Meeting
The team discussed progress on a paper draft and various technical tasks. Todd assigned Prashant to focus on completing a complete V1 draft of the paper by the end of the month, with a time budget of 2-3 days. Austin was tasked with ensemble work, while Felix was allocated 3 days maximum for backend tasks before needing to focus on other priorities. Prashant agreed to create a Ganon module and address database download issues, while Austin planned to work on sub-workflows. They scheduled a follow-up meeting for Tuesday at 11 AM to continue their collaboration.
application noteformat discussed previously may not be enough to get all the details about somatem and omi in there.. Let's go long~!Beta Was this translation helpful? Give feedback.
All reactions