FlexBio-PIPA

Flexible Bioinformatics Personal Intelligent Pipeline Agent

基于终端Agent的面向多场景、定制化需求的生信分析工具和管线开发工作流

Overview

FlexBio-PIPA is a multi-agent system that automates the full lifecycle of bioinformatics pipeline development. A Research Orchestrator coordinates specialised sub-agents through a stage-gate workflow:

Research Orchestrator (ResearchOrchestratorAgent) — drives the end-to-end pipeline: requirements freeze, literature search, tool collection, benchmarking, Snakemake implementation, testing, and iterative refinement.
Plan (PlanAgent) — set goals and decompose tasks into structured plans.
Literature Search (LiteratureSearchAgent) — search PubMed, arXiv, and bioRxiv for relevant papers and methods.
Tool Collection (ToolCollectionAgent) — search Bioconda, Galaxy ToolShed, and Snakemake Hub for existing tools and workflows.
Benchmark (BenchmarkAgent) — gather benchmark results and performance comparisons for competing methods.
Workflow (WorkflowAgent) — generate or refine Snakemake pipelines.
Build (BuildAgent) — autonomously build custom tools/workflows when ready-made solutions are unavailable.
Test Plan (TestPlanAgent) — design test plans with datasets and validation rules.

Each agent is enhanced with curated scientific skills drawn from K-Dense-AI/claude-scientific-skills (19 skills covering literature databases, bioinformatics libraries, and scientific methodology). Skills are automatically installed during project scaffolding and scoped per-agent via permission.skill frontmatter.

The LLM backend is OpenAI API-compatible, making it usable with:

Backend	Notes
OpenCode	Primary target (native agent support)
Claude Code	Theoretically compatible
Codex / OpenAI	Theoretically compatible
Any OpenAI-compatible server	e.g. Ollama, LM Studio

Quick Start

Installation

pip install flexbio-pipa
# or from source:
git clone https://github.com/WatchmanGu/FlexBio-PIPA.git
cd FlexBio-PIPA
pip install -e ".[dev]"

Configuration

Copy the example configuration and set your API key:

cp config/config.example.yaml config/config.yaml
export OPENAI_API_KEY="your-api-key"
# or set in config/config.yaml

Containerized Setup

FlexBio-PIPA can run inside any container runtime as long as the project workspace is mounted and the OpenCode-compatible LLM endpoint is reachable from inside the container.

Run the app container

Mount the project directory as a writable volume so generated artifacts stay on the host.
Mount or inject config/config.yaml and any secrets the same way you would for a local run.
Run commands from the mounted project root so AGENTS.md, .opencode/, and the stage directories stay aligned.

<container-runtime> run --rm \
  -v "$PWD:/workspace" \
  -w /workspace \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  <flexbio-pipa-image> \
  flexbio-pipa research-run --project-dir /workspace --task "..."

Configure the OpenCode runtime

Start an OpenCode-compatible server separately, for example with opencode serve.
Set llm.base_url to the reachable endpoint exposed by that runtime.
If the runtime is outside the container, replace localhost with the host or service name visible from the container.
Keep the project-local OpenCode layout on the mounted workspace so OpenCode can read AGENTS.md, .opencode/agents/, .opencode/skills/, and .opencode/tools/.

llm:
  base_url: "http://<reachable-host>:4096/v1"
  api_key: "${OPENAI_API_KEY}"

OpenCode Project-Local Workflow (Recommended)

If you usually create a new directory for each client request or pipeline project, FlexBio-PIPA now supports a project-local OpenCode layout.

The idea is simple:

each project gets its own AGENTS.md
each project gets its own .opencode/skills/
each project keeps its own requirements, literature, tools, implementation, tests, and reports in one place

1. Create a project directory and local Python environment

mkdir my-rnaseq-project
cd my-rnaseq-project

python -m venv .venv
source .venv/bin/activate

# Recommended while developing this agents team from source
pip install -e /path/to/FlexBio-PIPA

# Or install from a package release instead
# pip install flexbio-pipa

2. Add a project-local config file

You can use either config/config.yaml or config.yaml inside the project directory.

If you want a tiny starting point, copy the included templates:

mkdir -p config
cp /path/to/FlexBio-PIPA/examples/opencode_project/config.yaml config/config.yaml
cp /path/to/FlexBio-PIPA/examples/opencode_project/project.yaml ./project.yaml

Example config/config.yaml:

llm:
  base_url: "https://api.openai.com/v1"
  api_key: "${OPENAI_API_KEY}"
  model: "gpt-4o"

research:
  execution_profile: "local"
  auto_execute: false
  dry_run: true
  max_refinement_cycles: 1
  scientific_skills_root: "/path/to/claude-scientific-skills/scientific-skills"

Then export your API key if needed:

export OPENAI_API_KEY="your-api-key"

3. Bootstrap the project for OpenCode

# from the task text directly
flexbio-pipa init-project \
  --project-dir . \
  --task "Build an RNA-seq differential expression pipeline"

# or from the example task template you copied
flexbio-pipa init-project \
  --project-dir . \
  --task-file project.yaml

This creates:

AGENTS.md - project-level agent instructions for OpenCode
.opencode/agents/ - project-local agent definitions (one .md per FlexBio-PIPA agent)
.opencode/skills/ - project-local skill overrides
.opencode/tools/ - project-local custom tools
00_requirements/ through 05_reports/ - the working artifact layout

4. Open the project in OpenCode

OpenCode should be started in the project directory you just initialized.

Important project files:

AGENTS.md - project-specific working rules
.opencode/agents/ - project-specific agent definitions for OpenCode
.opencode/skills/ - project-specific skills you want OpenCode to use
.opencode/tools/ - project-specific custom tools
00_requirements/intake.md - the main task description for the project

Skill precedence is:

project-local .opencode/skills/
shared scientific skills from research.scientific_skills_root
repo-default skills bundled with FlexBio-PIPA

5. Run the research harness inside that project

# Use the task already stored in 00_requirements/intake.md
flexbio-pipa research-run --project-dir . --no-execute

# Or use the project task template directly
flexbio-pipa research-run --project-dir . --task-file project.yaml --no-execute

# Or override the task for a specific run
flexbio-pipa research-run \
  --project-dir . \
  --task "Build a metagenomic assembly pipeline" \
  --no-execute

# Run local execution as well
flexbio-pipa research-run --project-dir . --execute

During a run, FlexBio-PIPA writes structured artifacts into:

00_requirements/ - frozen scope, clarifications, skill audit
01_literature/ - search outputs and benchmark evidence
02_tools/ - tool candidates and shortlist
03_implementation/ - Snakemake files and implementation artifacts
04_tests/ - test plan and validation inputs
05_reports/ - execution analysis, revision requests, final summary

6. Customize per project when needed

Typical project-local customization points are:

edit AGENTS.md for project-specific collaboration rules
add or override skills in .opencode/skills/
update 00_requirements/intake.md as requirements change
keep local execution defaults in config/config.yaml

This makes the workflow convenient for OpenCode: each new development request can live in its own self-contained project folder, with its own agent rules, skills, artifacts, and reports.

Run the Agent System

# Interactive mode — provide a task description
flexbio-pipa run --task "Develop a variant calling pipeline for WGS data"

# Run with a YAML task file
flexbio-pipa run --task-file examples/wgs_variant_calling.yaml

# List available agents
flexbio-pipa list-agents

# Run only the planning agent
flexbio-pipa plan --task "RNA-seq differential expression analysis"

Python API

from flexbio_pipa.agents import PlanAgent, BuildAgent
from flexbio_pipa.utils.config import load_config

config = load_config("config/config.yaml")

# Create a plan for a bioinformatics task
plan_agent = PlanAgent(config=config)
plan = plan_agent.run("Develop a metagenomics classification pipeline")
print(plan.goals)
print(plan.steps)

# Execute the plan — build agents, search literature, collect tools
build_agent = BuildAgent(config=config)
result = build_agent.run(plan)

Architecture

flexbio_pipa/
├── agents/
│   ├── base.py                       # BaseAgent, AgentResult, Message
│   ├── sub_agent.py                  # SubAgent base class
│   ├── plan_agent.py                 # PlanAgent — goal setting & planning
│   ├── build_agent.py                # BuildAgent — tool/workflow development
│   ├── research_orchestrator_agent.py # ResearchOrchestratorAgent — end-to-end orchestration
│   ├── literature_search_agent.py    # PubMed / arXiv / bioRxiv search
│   ├── tool_collection_agent.py      # Tool/workflow collection
│   ├── benchmark_agent.py            # Benchmark result collection
│   ├── test_plan_agent.py            # Test plan design & validation
│   └── workflow_agent.py             # Snakemake pipeline generation
├── research/
│   ├── workspace.py       # Project scaffolding, agent templates, skill installation
│   ├── skills.py          # ScientificSkillRegistry — local skill audit & discovery
│   └── artifacts.py       # Structured artifact I/O for stage directories
├── execution/
│   ├── base.py            # ExecutionProfile ABC
│   ├── local.py           # LocalExecutionProfile — Snakemake dry-run / local execution
│   └── parser.py          # Execution output parsing and error extraction
├── tools/
│   ├── pubmed.py          # PubMed E-utilities wrapper
│   ├── arxiv.py           # arXiv API wrapper
│   ├── conda.py           # Conda/Bioconda package search
│   ├── galaxy.py          # Galaxy ToolShed search
│   ├── snakemake_hub.py   # Snakemake workflow hub search
│   └── code_generator.py  # LLM-based code generation
├── workflows/
│   ├── base.py            # Workflow base class
│   └── snakemake.py       # Snakemake workflow builder
├── utils/
│   ├── config.py          # YAML configuration loader with ${ENV_VAR} resolution
│   ├── llm.py             # LLM client (OpenAI-compatible)
│   └── logger.py          # Rich-based logger
└── cli.py                 # Click CLI — init-project, research-run, run, plan, list-agents

OpenCode-Native Project Layout

When init-project scaffolds a new project, it creates:

my-pipeline-project/
├── AGENTS.md                        # Project-level agent instructions
├── .opencode/
│   ├── agents/                      # One .md per FlexBio-PIPA agent (7 agents)
│   │   ├── research-orchestrator.md
│   │   ├── pipeline-planner.md
│   │   ├── literature-search.md
│   │   ├── tool-collection.md
│   │   ├── benchmark.md
│   │   ├── workflow.md
│   │   └── test-plan.md
│   ├── skills/                      # Curated scientific skills (up to 19)
│   │   ├── flexbio-pipa-project-workspace/
│   │   ├── pubmed-database/
│   │   ├── arxiv-database/
│   │   ├── biorxiv-database/
│   │   ├── literature-review/
│   │   ├── biopython/
│   │   ├── ...
│   │   └── scientific-writing/
│   └── tools/                       # Custom tool definitions (placeholder)
├── 00_requirements/                 # Intake, requirement freeze, skill audit
├── 01_literature/                   # Search strategies, evidence summaries
├── 02_tools/                        # Tool candidates, shortlist
├── 03_implementation/               # Snakefile, config, scripts, envs
├── 04_tests/                        # Test plan, datasets, validation rules
└── 05_reports/                      # Execution reports, final recommendation

Scientific Skills

FlexBio-PIPA integrates 19 curated scientific skills from K-Dense-AI/claude-scientific-skills to give each agent domain-specific knowledge. Skills are installed into .opencode/skills/ during init-project and scoped per-agent so each agent only sees relevant skills.

Agent-to-Skill Mapping

Agent	Skills
research-orchestrator	`scientific-writing`, `scientific-brainstorming`
pipeline-planner	`scientific-brainstorming`, `hypothesis-generation`
literature-search	`pubmed-database`, `arxiv-database`, `biorxiv-database`, `literature-review`
tool-collection	`gget`, `ensembl-database`, `bioservices`
benchmark	`scientific-critical-thinking`, `peer-review`
workflow	`biopython`, `pysam`, `deeptools`, `pydeseq2`, `scanpy`, `scikit-bio`
test-plan	`scientific-critical-thinking`

All agents also have access to the built-in flexbio-pipa-project-workspace skill, which describes the project layout and conventions.

How It Works

Set research.scientific_skills_root in your config to point at a local clone of the scientific skills repository:
```
research:
  scientific_skills_root: "/path/to/claude-scientific-skills/scientific-skills"
```
Run flexbio-pipa init-project. The scaffolder copies each skill's SKILL.md plus any references/, scripts/, and assets/ subdirectories into the project's .opencode/skills/<name>/ folder.
Each agent's frontmatter includes a permission.skill block that allows only its mapped skills and denies everything else. This keeps agent context focused and prevents irrelevant skill loading.

Existing skills in the target directory are never overwritten, so project-local customizations are preserved across re-runs.

Skill Precedence

Project-local .opencode/skills/ (highest priority)
Scientific skills installed from research.scientific_skills_root
Repo-default skills bundled with FlexBio-PIPA

Configuration

config/config.yaml:

llm:
  base_url: "https://api.openai.com/v1"   # or your OpenCode/Ollama URL
  api_key: "${OPENAI_API_KEY}"
  model: "gpt-4o"
  temperature: 0.2
  max_tokens: 4096
  timeout: 120

agents:
  # OpenCode agent model assignment (used by init-project)
  opencode_agents:
    models:
      strong: "github-copilot/claude-opus-4.6"    # orchestrator, planner, workflow
      fast: "github-copilot/gpt-5.4-mini"     # lit-search, tool-collection, benchmark, test-plan
    # Per-agent overrides (optional):
    # overrides:
    #   workflow: "github-copilot/claude-sonnet-4"

  plan:
    max_iterations: 3
  build:
    max_iterations: 10
  research_orchestrator:
    max_stages: 8
  literature_search:
    max_results: 20
    databases: ["pubmed", "arxiv"]
  tool_collection:
    sources: ["bioconda", "galaxy", "snakemake_hub"]
    max_results: 10
  benchmark:
    max_results: 10
  test_plan:
    test_data_sources: ["sra", "zenodo"]
  workflow:
    engine: snakemake

research:
  workspace_root: "workspace/research"
  scientific_skills_root: "/path/to/claude-scientific-skills/scientific-skills"
  execution_profile: "local"
  auto_execute: false
  dry_run: true
  max_refinement_cycles: 1
  snakemake:
    cores: 1

logging:
  level: "INFO"
  file: null

Model Tier Assignment

The agents.opencode_agents.models section controls which LLM model is written into each agent's .md frontmatter during init-project:

Tier	Default	Agents
strong	`claude-opus-4.6`	research-orchestrator, pipeline-planner, workflow
fast	`gpt-5.4-mini`	literature-search, tool-collection, benchmark, test-plan

Use agents.opencode_agents.overrides to assign a specific model to any individual agent.

Development

pip install -e ".[dev]"

# Run the full test suite (91 tests)
pytest tests/ -v

# Lint and type-check
ruff check src/
mypy src/

# Format
black src/ tests/

Key Dependencies

Package	Purpose
`openai`	LLM chat completion API
`click`	CLI framework
`requests`	HTTP for PubMed/arXiv/Conda/Galaxy
`tenacity`	Retry with exponential backoff
`pyyaml`	YAML config loading
`rich`	Terminal output and logging
`pydantic`	Data validation
`jinja2`	Templating

Dev: pytest, pytest-cov, pytest-mock, responses, ruff, black, mypy.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.agents/skills		.agents/skills
config		config
examples		examples
src/flexbio_pipa		src/flexbio_pipa
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlexBio-PIPA

Overview

Quick Start

Installation

Configuration

Containerized Setup

Run the app container

Configure the OpenCode runtime

OpenCode Project-Local Workflow (Recommended)

1. Create a project directory and local Python environment

2. Add a project-local config file

3. Bootstrap the project for OpenCode

4. Open the project in OpenCode

5. Run the research harness inside that project

6. Customize per project when needed

Run the Agent System

Python API

Architecture

OpenCode-Native Project Layout

Scientific Skills

Agent-to-Skill Mapping

How It Works

Skill Precedence

Configuration

Model Tier Assignment

Development

Key Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlexBio-PIPA

Overview

Quick Start

Installation

Configuration

Containerized Setup

Run the app container

Configure the OpenCode runtime

OpenCode Project-Local Workflow (Recommended)

1. Create a project directory and local Python environment

2. Add a project-local config file

3. Bootstrap the project for OpenCode

4. Open the project in OpenCode

5. Run the research harness inside that project

6. Customize per project when needed

Run the Agent System

Python API

Architecture

OpenCode-Native Project Layout

Scientific Skills

Agent-to-Skill Mapping

How It Works

Skill Precedence

Configuration

Model Tier Assignment

Development

Key Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages