Skip to content

Add WebSearchAPI.ai Provider and MCP Server#1

Open
nazq wants to merge 1 commit into
xynehq:mainfrom
nazq:mcp_n_websearchapi
Open

Add WebSearchAPI.ai Provider and MCP Server#1
nazq wants to merge 1 commit into
xynehq:mainfrom
nazq:mcp_n_websearchapi

Conversation

@nazq
Copy link
Copy Markdown

@nazq nazq commented Dec 25, 2025

Summary

This PR introduces three enhancements to the websearch library:

  1. WebSearchAPI.ai Provider - A new search provider optimized for LLM consumption
  2. Content Extraction Support - Extended SearchResult type to include full page content
  3. MCP Server - Model Context Protocol server for AI assistant integration

These additions expand the library's capabilities while maintaining full backward compatibility with existing code.

Motivation

The original search-sdk provides excellent multi-provider search capabilities. As AI assistants become more prevalent, there's growing demand for:

  • Search results with full page content (not just snippets) for deeper analysis
  • Standardized protocols for AI tools to access search functionality
  • Providers specifically designed to return LLM-ready formatted content

This PR addresses all three needs while respecting the existing architecture and patterns established in the codebase.

Changes

1. WebSearchAPI.ai Provider (src/providers/websearchapi_ai.rs)

A new provider that integrates with WebSearchAPI.ai, which is specifically designed for AI/LLM use cases:

  • Returns full page content as clean markdown
  • Supports domain filtering (include/exclude lists)
  • Configurable content extraction depth
  • Automatic answer generation from search context
use websearch::providers::websearchapi_ai::WebSearchApiProvider;

let provider = WebSearchApiProvider::new("your-api-key")?
    .with_include_domains(vec!["docs.rs", "crates.io"])
    .with_max_content_tokens(5000);

2. Extended SearchResult Type (src/types.rs)

Added optional fields to SearchResult for providers that support content extraction:

pub struct SearchResult {
    // Existing fields unchanged...
    pub url: String,
    pub title: String,
    pub snippet: Option<String>,
    pub domain: Option<String>,
    pub published_date: Option<String>,
    pub provider: Option<String>,
    pub raw: Option<serde_json::Value>,

    // New fields for content extraction
    pub content: Option<String>,        // Full page content
    pub content_format: Option<String>, // "markdown", "text", "html"
    pub word_count: Option<u32>,        // Content length indicator
}

These fields are Option types, so existing providers continue to work unchanged. Providers that support content (WebSearchAPI.ai, Exa, Tavily advanced mode) populate these fields automatically.

3. MCP Server (src/mcp/, src/bin/websearch_mcp.rs)

A complete Model Context Protocol implementation enabling AI assistants to use the search library:

Features:

  • Dual transport support: stdio (for Claude Desktop) and HTTP (for containerized deployments)
  • Two MCP tools: web_search and list_providers
  • Health check endpoint for container orchestration
  • Configurable default provider via environment variables

Tools exposed:

Tool Description
web_search Search with configurable provider, max results, and content extraction
list_providers List available providers and their configuration status

Usage:

# Build
cargo build --release --features mcp --bin websearch-mcp

# Stdio mode (Claude Desktop)
./websearch-mcp

# HTTP mode (Docker/Kubernetes)
./websearch-mcp --transport http --bind-addr 0.0.0.0:3000

Docker support:

docker build -f Dockerfile.mcp -t websearch-mcp .
docker run -p 3000:3000 -e WEBSEARCHAPI_KEY=xxx websearch-mcp

4. Provider Improvements

  • Exa: Fixed autopromptString field parsing (now optional with #[serde(default)])
  • Tavily: Added support for raw_content field in advanced mode
  • DuckDuckGo: Improved test reliability for web scraping edge cases

Files Changed

File Change
Cargo.toml Added mcp feature with rmcp, schemars, axum dependencies
src/lib.rs Export mcp module, updated doc examples
src/types.rs Added content, content_format, word_count fields
src/providers/mod.rs Added websearchapi_ai module
src/providers/websearchapi_ai.rs New provider implementation
src/providers/exa.rs Fixed autopromptString parsing
src/mcp/mod.rs MCP module structure
src/mcp/server.rs MCP server implementation
src/mcp/schemas.rs JSON schema definitions for MCP tools
src/bin/websearch_mcp.rs MCP binary with transport selection
Dockerfile.mcp Container build for MCP server
MCP.md Comprehensive MCP documentation
.cargo/config.toml.example Example environment configuration
tests/provider_integration.rs Integration tests for new providers
tests/cli_tests.rs Improved test reliability

Testing

All existing tests pass. New test coverage includes:

  • WebSearchAPI.ai provider unit tests with wiremock
  • Exa provider response parsing tests
  • Tavily provider with content extraction
  • MCP server initialization and tool listing
  • Docker container health check verification

New Integration Test Suite (tests/provider_integration.rs)

Added a comprehensive integration test framework that validates providers against live APIs. Tests are gated by environment variables so they only run when API keys are available:

# Run unit tests only (no API keys needed)
cargo test

# Run integration tests for specific providers
WEBSEARCHAPI_KEY=xxx cargo test --test provider_integration
TAVILY_API_KEY=xxx cargo test --test provider_integration
EXA_API_KEY=xxx cargo test --test provider_integration

# Run all integration tests
WEBSEARCHAPI_KEY=xxx TAVILY_API_KEY=xxx EXA_API_KEY=xxx cargo test --test provider_integration

The integration tests verify:

  • Provider authentication and API connectivity
  • Response parsing and field mapping
  • Content extraction where supported
  • Error handling for invalid credentials

Backward Compatibility

This PR maintains full backward compatibility:

  • All existing SearchResult consumers continue to work (new fields are Option)
  • No changes to existing provider APIs
  • MCP feature is opt-in (--features mcp)
  • No breaking changes to public interfaces

Documentation

  • MCP.md - Complete guide for MCP server setup and usage
  • Updated inline documentation for new types and functions
  • .cargo/config.toml.example - Template for local development

Future Considerations

Some ideas for potential future work (not in scope for this PR):

  • Additional MCP tools (e.g., fetch_url for direct URL content extraction)
  • Streaming responses for large content
  • Provider-specific MCP tools exposing unique capabilities
  • Would be great to build and push prebuilt Docker images when it's released, happy to PR a change for this

Features:
- WebSearchAPI.ai provider with full markdown content extraction
- MCP server supporting stdio (default) and HTTP transport modes
- Docker deployment with health check endpoint
- Exa provider with optional autopromptString parsing
- Tavily provider integration

Includes Dockerfile.mcp for containerized deployments and
comprehensive test coverage for all providers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant