Skip to content

WatchmanGu/FlexBio-PIPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlexBio-PIPA

Flexible Bioinformatics Personal Intelligent Pipeline Agent

基于终端Agent的面向多场景、定制化需求的生信分析工具和管线开发工作流


Overview

FlexBio-PIPA is a multi-agent system that automates the full lifecycle of bioinformatics pipeline development. A Research Orchestrator coordinates specialised sub-agents through a stage-gate workflow:

  • Research Orchestrator (ResearchOrchestratorAgent) — drives the end-to-end pipeline: requirements freeze, literature search, tool collection, benchmarking, Snakemake implementation, testing, and iterative refinement.
  • Plan (PlanAgent) — set goals and decompose tasks into structured plans.
  • Literature Search (LiteratureSearchAgent) — search PubMed, arXiv, and bioRxiv for relevant papers and methods.
  • Tool Collection (ToolCollectionAgent) — search Bioconda, Galaxy ToolShed, and Snakemake Hub for existing tools and workflows.
  • Benchmark (BenchmarkAgent) — gather benchmark results and performance comparisons for competing methods.
  • Workflow (WorkflowAgent) — generate or refine Snakemake pipelines.
  • Build (BuildAgent) — autonomously build custom tools/workflows when ready-made solutions are unavailable.
  • Test Plan (TestPlanAgent) — design test plans with datasets and validation rules.

Each agent is enhanced with curated scientific skills drawn from K-Dense-AI/claude-scientific-skills (19 skills covering literature databases, bioinformatics libraries, and scientific methodology). Skills are automatically installed during project scaffolding and scoped per-agent via permission.skill frontmatter.

The LLM backend is OpenAI API-compatible, making it usable with:

Backend Notes
OpenCode Primary target (native agent support)
Claude Code Theoretically compatible
Codex / OpenAI Theoretically compatible
Any OpenAI-compatible server e.g. Ollama, LM Studio

Quick Start

Installation

pip install flexbio-pipa
# or from source:
git clone https://github.com/WatchmanGu/FlexBio-PIPA.git
cd FlexBio-PIPA
pip install -e ".[dev]"

Configuration

Copy the example configuration and set your API key:

cp config/config.example.yaml config/config.yaml
export OPENAI_API_KEY="your-api-key"
# or set in config/config.yaml

Containerized Setup

FlexBio-PIPA can run inside any container runtime as long as the project workspace is mounted and the OpenCode-compatible LLM endpoint is reachable from inside the container.

Run the app container

  • Mount the project directory as a writable volume so generated artifacts stay on the host.
  • Mount or inject config/config.yaml and any secrets the same way you would for a local run.
  • Run commands from the mounted project root so AGENTS.md, .opencode/, and the stage directories stay aligned.
<container-runtime> run --rm \
  -v "$PWD:/workspace" \
  -w /workspace \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  <flexbio-pipa-image> \
  flexbio-pipa research-run --project-dir /workspace --task "..."

Configure the OpenCode runtime

  • Start an OpenCode-compatible server separately, for example with opencode serve.
  • Set llm.base_url to the reachable endpoint exposed by that runtime.
  • If the runtime is outside the container, replace localhost with the host or service name visible from the container.
  • Keep the project-local OpenCode layout on the mounted workspace so OpenCode can read AGENTS.md, .opencode/agents/, .opencode/skills/, and .opencode/tools/.
llm:
  base_url: "http://<reachable-host>:4096/v1"
  api_key: "${OPENAI_API_KEY}"

OpenCode Project-Local Workflow (Recommended)

If you usually create a new directory for each client request or pipeline project, FlexBio-PIPA now supports a project-local OpenCode layout.

The idea is simple:

  • each project gets its own AGENTS.md
  • each project gets its own .opencode/skills/
  • each project keeps its own requirements, literature, tools, implementation, tests, and reports in one place

1. Create a project directory and local Python environment

mkdir my-rnaseq-project
cd my-rnaseq-project

python -m venv .venv
source .venv/bin/activate

# Recommended while developing this agents team from source
pip install -e /path/to/FlexBio-PIPA

# Or install from a package release instead
# pip install flexbio-pipa

2. Add a project-local config file

You can use either config/config.yaml or config.yaml inside the project directory.

If you want a tiny starting point, copy the included templates:

mkdir -p config
cp /path/to/FlexBio-PIPA/examples/opencode_project/config.yaml config/config.yaml
cp /path/to/FlexBio-PIPA/examples/opencode_project/project.yaml ./project.yaml

Example config/config.yaml:

llm:
  base_url: "https://api.openai.com/v1"
  api_key: "${OPENAI_API_KEY}"
  model: "gpt-4o"

research:
  execution_profile: "local"
  auto_execute: false
  dry_run: true
  max_refinement_cycles: 1
  scientific_skills_root: "/path/to/claude-scientific-skills/scientific-skills"

Then export your API key if needed:

export OPENAI_API_KEY="your-api-key"

3. Bootstrap the project for OpenCode

# from the task text directly
flexbio-pipa init-project \
  --project-dir . \
  --task "Build an RNA-seq differential expression pipeline"

# or from the example task template you copied
flexbio-pipa init-project \
  --project-dir . \
  --task-file project.yaml

This creates:

  • AGENTS.md - project-level agent instructions for OpenCode
  • .opencode/agents/ - project-local agent definitions (one .md per FlexBio-PIPA agent)
  • .opencode/skills/ - project-local skill overrides
  • .opencode/tools/ - project-local custom tools
  • 00_requirements/ through 05_reports/ - the working artifact layout

4. Open the project in OpenCode

OpenCode should be started in the project directory you just initialized.

Important project files:

  • AGENTS.md - project-specific working rules
  • .opencode/agents/ - project-specific agent definitions for OpenCode
  • .opencode/skills/ - project-specific skills you want OpenCode to use
  • .opencode/tools/ - project-specific custom tools
  • 00_requirements/intake.md - the main task description for the project

Skill precedence is:

  1. project-local .opencode/skills/
  2. shared scientific skills from research.scientific_skills_root
  3. repo-default skills bundled with FlexBio-PIPA

5. Run the research harness inside that project

# Use the task already stored in 00_requirements/intake.md
flexbio-pipa research-run --project-dir . --no-execute

# Or use the project task template directly
flexbio-pipa research-run --project-dir . --task-file project.yaml --no-execute

# Or override the task for a specific run
flexbio-pipa research-run \
  --project-dir . \
  --task "Build a metagenomic assembly pipeline" \
  --no-execute

# Run local execution as well
flexbio-pipa research-run --project-dir . --execute

During a run, FlexBio-PIPA writes structured artifacts into:

  • 00_requirements/ - frozen scope, clarifications, skill audit
  • 01_literature/ - search outputs and benchmark evidence
  • 02_tools/ - tool candidates and shortlist
  • 03_implementation/ - Snakemake files and implementation artifacts
  • 04_tests/ - test plan and validation inputs
  • 05_reports/ - execution analysis, revision requests, final summary

6. Customize per project when needed

Typical project-local customization points are:

  • edit AGENTS.md for project-specific collaboration rules
  • add or override skills in .opencode/skills/
  • update 00_requirements/intake.md as requirements change
  • keep local execution defaults in config/config.yaml

This makes the workflow convenient for OpenCode: each new development request can live in its own self-contained project folder, with its own agent rules, skills, artifacts, and reports.

Run the Agent System

# Interactive mode — provide a task description
flexbio-pipa run --task "Develop a variant calling pipeline for WGS data"

# Run with a YAML task file
flexbio-pipa run --task-file examples/wgs_variant_calling.yaml

# List available agents
flexbio-pipa list-agents

# Run only the planning agent
flexbio-pipa plan --task "RNA-seq differential expression analysis"

Python API

from flexbio_pipa.agents import PlanAgent, BuildAgent
from flexbio_pipa.utils.config import load_config

config = load_config("config/config.yaml")

# Create a plan for a bioinformatics task
plan_agent = PlanAgent(config=config)
plan = plan_agent.run("Develop a metagenomics classification pipeline")
print(plan.goals)
print(plan.steps)

# Execute the plan — build agents, search literature, collect tools
build_agent = BuildAgent(config=config)
result = build_agent.run(plan)

Architecture

flexbio_pipa/
├── agents/
│   ├── base.py                       # BaseAgent, AgentResult, Message
│   ├── sub_agent.py                  # SubAgent base class
│   ├── plan_agent.py                 # PlanAgent — goal setting & planning
│   ├── build_agent.py                # BuildAgent — tool/workflow development
│   ├── research_orchestrator_agent.py # ResearchOrchestratorAgent — end-to-end orchestration
│   ├── literature_search_agent.py    # PubMed / arXiv / bioRxiv search
│   ├── tool_collection_agent.py      # Tool/workflow collection
│   ├── benchmark_agent.py            # Benchmark result collection
│   ├── test_plan_agent.py            # Test plan design & validation
│   └── workflow_agent.py             # Snakemake pipeline generation
├── research/
│   ├── workspace.py       # Project scaffolding, agent templates, skill installation
│   ├── skills.py          # ScientificSkillRegistry — local skill audit & discovery
│   └── artifacts.py       # Structured artifact I/O for stage directories
├── execution/
│   ├── base.py            # ExecutionProfile ABC
│   ├── local.py           # LocalExecutionProfile — Snakemake dry-run / local execution
│   └── parser.py          # Execution output parsing and error extraction
├── tools/
│   ├── pubmed.py          # PubMed E-utilities wrapper
│   ├── arxiv.py           # arXiv API wrapper
│   ├── conda.py           # Conda/Bioconda package search
│   ├── galaxy.py          # Galaxy ToolShed search
│   ├── snakemake_hub.py   # Snakemake workflow hub search
│   └── code_generator.py  # LLM-based code generation
├── workflows/
│   ├── base.py            # Workflow base class
│   └── snakemake.py       # Snakemake workflow builder
├── utils/
│   ├── config.py          # YAML configuration loader with ${ENV_VAR} resolution
│   ├── llm.py             # LLM client (OpenAI-compatible)
│   └── logger.py          # Rich-based logger
└── cli.py                 # Click CLI — init-project, research-run, run, plan, list-agents

OpenCode-Native Project Layout

When init-project scaffolds a new project, it creates:

my-pipeline-project/
├── AGENTS.md                        # Project-level agent instructions
├── .opencode/
│   ├── agents/                      # One .md per FlexBio-PIPA agent (7 agents)
│   │   ├── research-orchestrator.md
│   │   ├── pipeline-planner.md
│   │   ├── literature-search.md
│   │   ├── tool-collection.md
│   │   ├── benchmark.md
│   │   ├── workflow.md
│   │   └── test-plan.md
│   ├── skills/                      # Curated scientific skills (up to 19)
│   │   ├── flexbio-pipa-project-workspace/
│   │   ├── pubmed-database/
│   │   ├── arxiv-database/
│   │   ├── biorxiv-database/
│   │   ├── literature-review/
│   │   ├── biopython/
│   │   ├── ...
│   │   └── scientific-writing/
│   └── tools/                       # Custom tool definitions (placeholder)
├── 00_requirements/                 # Intake, requirement freeze, skill audit
├── 01_literature/                   # Search strategies, evidence summaries
├── 02_tools/                        # Tool candidates, shortlist
├── 03_implementation/               # Snakefile, config, scripts, envs
├── 04_tests/                        # Test plan, datasets, validation rules
└── 05_reports/                      # Execution reports, final recommendation

Scientific Skills

FlexBio-PIPA integrates 19 curated scientific skills from K-Dense-AI/claude-scientific-skills to give each agent domain-specific knowledge. Skills are installed into .opencode/skills/ during init-project and scoped per-agent so each agent only sees relevant skills.

Agent-to-Skill Mapping

Agent Skills
research-orchestrator scientific-writing, scientific-brainstorming
pipeline-planner scientific-brainstorming, hypothesis-generation
literature-search pubmed-database, arxiv-database, biorxiv-database, literature-review
tool-collection gget, ensembl-database, bioservices
benchmark scientific-critical-thinking, peer-review
workflow biopython, pysam, deeptools, pydeseq2, scanpy, scikit-bio
test-plan scientific-critical-thinking

All agents also have access to the built-in flexbio-pipa-project-workspace skill, which describes the project layout and conventions.

How It Works

  1. Set research.scientific_skills_root in your config to point at a local clone of the scientific skills repository:

    research:
      scientific_skills_root: "/path/to/claude-scientific-skills/scientific-skills"
  2. Run flexbio-pipa init-project. The scaffolder copies each skill's SKILL.md plus any references/, scripts/, and assets/ subdirectories into the project's .opencode/skills/<name>/ folder.

  3. Each agent's frontmatter includes a permission.skill block that allows only its mapped skills and denies everything else. This keeps agent context focused and prevents irrelevant skill loading.

Existing skills in the target directory are never overwritten, so project-local customizations are preserved across re-runs.

Skill Precedence

  1. Project-local .opencode/skills/ (highest priority)
  2. Scientific skills installed from research.scientific_skills_root
  3. Repo-default skills bundled with FlexBio-PIPA

Configuration

config/config.yaml:

llm:
  base_url: "https://api.openai.com/v1"   # or your OpenCode/Ollama URL
  api_key: "${OPENAI_API_KEY}"
  model: "gpt-4o"
  temperature: 0.2
  max_tokens: 4096
  timeout: 120

agents:
  # OpenCode agent model assignment (used by init-project)
  opencode_agents:
    models:
      strong: "github-copilot/claude-opus-4.6"    # orchestrator, planner, workflow
      fast: "github-copilot/gpt-5.4-mini"     # lit-search, tool-collection, benchmark, test-plan
    # Per-agent overrides (optional):
    # overrides:
    #   workflow: "github-copilot/claude-sonnet-4"

  plan:
    max_iterations: 3
  build:
    max_iterations: 10
  research_orchestrator:
    max_stages: 8
  literature_search:
    max_results: 20
    databases: ["pubmed", "arxiv"]
  tool_collection:
    sources: ["bioconda", "galaxy", "snakemake_hub"]
    max_results: 10
  benchmark:
    max_results: 10
  test_plan:
    test_data_sources: ["sra", "zenodo"]
  workflow:
    engine: snakemake

research:
  workspace_root: "workspace/research"
  scientific_skills_root: "/path/to/claude-scientific-skills/scientific-skills"
  execution_profile: "local"
  auto_execute: false
  dry_run: true
  max_refinement_cycles: 1
  snakemake:
    cores: 1

logging:
  level: "INFO"
  file: null

Model Tier Assignment

The agents.opencode_agents.models section controls which LLM model is written into each agent's .md frontmatter during init-project:

Tier Default Agents
strong claude-opus-4.6 research-orchestrator, pipeline-planner, workflow
fast gpt-5.4-mini literature-search, tool-collection, benchmark, test-plan

Use agents.opencode_agents.overrides to assign a specific model to any individual agent.


Development

pip install -e ".[dev]"

# Run the full test suite (91 tests)
pytest tests/ -v

# Lint and type-check
ruff check src/
mypy src/

# Format
black src/ tests/

Key Dependencies

Package Purpose
openai LLM chat completion API
click CLI framework
requests HTTP for PubMed/arXiv/Conda/Galaxy
tenacity Retry with exponential backoff
pyyaml YAML config loading
rich Terminal output and logging
pydantic Data validation
jinja2 Templating

Dev: pytest, pytest-cov, pytest-mock, responses, ruff, black, mypy.


License

MIT

About

Flexible Bioinformatics Personal Intelligent Pipeline Agent. 基于终端Agent的面向多场景、定制化需求的生信分析工具和管线开发工作流

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages