"Model Context Protocol (MCP) connects Claude to third-party tools, and skills teach Claude how to use them well."
β Extending Claude's capabilities with skills and MCP servers, Anthropic
As AI agents evolve, there is a growing need for modular, reusable approaches to equip them with domain-specific expertise while mitigating issues like excessive MCP context consumption. To address this, Anthropic introduced Agent Skills as an open standard on December 18, 2025, allowing agents to dynamically load structured instructions and resources for more effective task execution. Although platforms such as OpenAI Codex have adopted this standard, native support remains limited to specific ecosystems.
However, many LLM providers have not yet adopted the Agent Skills standard, leaving this efficient approach temporarily inaccessible to a broader audience. We, the Project Q team at SII, bridge this gap by providing a lightweight, efficient open-source framework fully compatible with the Agent Skills standard, extending these capabilities to any LLM provider. Our implementation focuses on the synergy between Model Context Protocol (MCP) and Skills: MCP provides access to external tools and systems, while Skills provide the procedural knowledge to utilize tools (including MCP) effectively. With our skill-based implementation, we achieved up to ~20x context reduction compared to pure MCP approaches.
This project builds upon MCPMark, a comprehensive evaluation suite for assessing the agentic capabilities of frontier models. We extend MCPMark's benchmark capabilities by introducing a skill-based Implementation.
- β‘ Lightweight Skill Implementation: Built with a minimal framework approach, making skills implementation easy to read and quick to understand
- π Skill + MCP Integration: See how skills leverage MCP's external knowledge capabilities and organize multiple MCP services to collaboratively complete tasks
- π Claude Agent Standard Compatible: Skills can be directly executed in Claude and other compatible agent environments
- π LLM provider Agnostic: Not limited to Claude or Codex, any LLM can leverage skills to enhance efficiency
- Python 3.11+
- uv package manager (recommended, for faster installs)
# Clone the repository
git clone https://github.com/zjtco-yr/open-agent-skills.git
cd open-agent-skills
# install with uv (faster)
uv pip install -e .Create a .mcp_env file with your API credentials:
# Example: OpenAI
OPENAI_BASE_URL="https://api.openai.com/v1"
OPENAI_API_KEY="sk-..."
# Optional: Notion (only for Notion tasks)
# SOURCE_NOTION_API_KEY="your-source-notion-api-key"
# EVAL_NOTION_API_KEY="your-eval-notion-api-key"
# EVAL_PARENT_PAGE_TITLE="MCPMark Eval Hub"
# Optional: Playwright (only for Playwright tasks)
PLAYWRIGHT_BROWSER="chromium" # chromium | firefox
PLAYWRIGHT_HEADLESS="True"
# Optional: GitHub (only for GitHub tasks)
GITHUB_TOKENS="token1,token2" # token pooling for rate limits
GITHUB_EVAL_ORG="your-eval-org"For more detailed environment configuration, service setup, and authentication instructions, please refer to MCPMark.
We conducted preliminary benchmark evaluations comparing MCP with Open-Agent-Skills (Skill with MCP) using the Claude-Sonnet-4.5 model. Results demonstrate that Skill with MCP achieves significant advantages in both task performance and token efficiency.
| Benchmark | MCP | Skills |
|---|---|---|
| GitHub | 29.35% | 43.48% |
| Filesystem | 32.5% | 53.3% |
| Playwright WebArena | 32.14% | 52.38% |
| Benchmark | MCP Tokens | Skills Tokens | Reduction |
|---|---|---|---|
| GitHub | 7.36M | 4.47M | 39.25% |
| Filesystem | 2.27M | 1.55M | 31.55% |
| Playwright WebArena | 10.25M | 8.27M | 19.30% |
| Task | MCP Tokens | Skills Tokens | Reduction |
|---|---|---|---|
english_talent (Filesystem) |
1.05M | 0.053M | 95.00% (~20x) |
find_commit_date (GitHub) |
1.21M | 0.23M | 80.84% |
marketing_customer_analysis(Playwright_webarena) |
1.45M | 0.67M | 54.00% |
- Significant Accuracy Improvement: Average Pass@1 accuracy increased by ~18 percentage points across all three benchmarks
- Excellent Token Efficiency: Token consumption for successful tasks reduced by 20-40%, with extreme cases reaching 80-95%
Task: This task involves multiple operations including account registration, creating posts, and setting up forums. For the complete task description, see tasks/playwright_webarena/standard/reddit/budget_europe_travel.
Result: Using 4 skills, achieved 2x context reduction.
# Start the browser server
uv run skills/scripts/reddit/browser_server.py &
# Execute a reddit task with skills
python -m pipeline \
--mcp playwright_webarena \
--models claude-sonnet-4.5 \
--tasks reddit/budget_europe_travel \
--exp-name skill-demo \
--k 1We acknowledge that the current implementation has room for improvement. This project is intended as a starting point to demonstrate the potential of combining MCP with domain-specific skills.
-
Balancing Specificity and Generalization: While some of our skills are designed with broad applicability (e.g., Playwright-based skills for registration, posting, forum creation), others remain highly task-specific (e.g., time-based file classification in filesystem skills). We will focus on improving skill design patterns to strike a better balance between task-specific performance and cross-domain generalization.
-
Skill-MCP Integration: We plan to:
- Develop better coordination patterns between skills and MCP tools
- Enable smoother handoffs between skill scripts and MCP operations
-
Extended Domain Coverage:
- Advanced database operations and Notion workspace automation
- Common daily life use cases (productivity, personal automation, etc.)
Security is a primary consideration in our skill execution framework. It inherits and leverages several security features from MCPMark, including:
- Directory Isolation: MCP/Skill file operations are confined to specific directories
- Docker Containerization: Tasks run in isolated containers
However, we strongly recommend that you always review skill scripts before execution in production, whether creating new skills or using existing ones.
- Model Context Protocol (MCP) - Anthropic's protocol for connecting AI assistants to external tools
- Agent Skills Standard - A simple, open format for giving agents new capabilities and expertise.
- Agent Skills Overview - Anthropic's official guide for Claude agent skills
- Skills Docs - Skills integration in Claude Code
- Skills in Codex - Give Codex new capabilities with support for explicit and implicit invocation
After configuring the relevant APIs and Docker environment (if needed), you can run the following commands:
python -m pipeline --mcp playwright_webarena \
--models claude-sonnet-4.5 \
--tasks reddit/marketing_customer_analysis \
--exp-name webarena-test \
--k 1python -m pipeline --mcp filesystem \
--models claude-sonnet-4.5 \
--tasks student_database/english_talent \
--exp-name filesystem-test \
--k 1python -m pipeline --mcp github \
--models claude-sonnet-4.5 \
--tasks build_your_own_x/find_commit_date \
--exp-name github-test \
--k 1This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Open-Agent-Skills - Making AI agents more effective through domain expertise skill









