diff --git a/README.md b/README.md index d8bd3dc..c7f91a7 100644 --- a/README.md +++ b/README.md @@ -14,11 +14,13 @@ A high-performance Rust library and command-line tool for searching across multi ## Features ### 🏗️ Dual Purpose Design + - **📚 Rust Library**: Integrate web search into your Rust applications - **⚡ CLI Binary**: Ready-to-use command-line search tool - **🔧 Single Installation**: One `cargo install` command gets you both ### 🔍 Search Capabilities + - **Multiple Providers**: Unified interface for 8+ search providers - **Standardized Results**: Consistent result format across all providers - **Multi-Provider Search**: Query multiple search engines simultaneously @@ -26,12 +28,14 @@ A high-performance Rust library and command-line tool for searching across multi - **Result Aggregation**: Combine and merge results from multiple providers ### 🦀 Rust-Powered Performance + - **High Performance**: Built with Rust for maximum speed and efficiency - **Memory Safe**: Zero-cost abstractions with compile-time safety guarantees - **Type Safe**: Full type safety with comprehensive error handling - **Async/Await**: Modern async Rust for non-blocking operations ### 🛠️ Developer Experience + - **Simple CLI**: `websearch "your query"` - that's it! - **Debug Support**: Configurable logging for development and debugging - **Provider Statistics**: Track performance metrics for each search provider @@ -39,16 +43,16 @@ A high-performance Rust library and command-line tool for searching across multi ## Supported Search Providers -| Provider | Status | API Key Required | Notes | -|----------|--------|------------------|-------| -| **Google Custom Search** | ✅ Complete | Yes | Requires API key + Search Engine ID | -| **DuckDuckGo** | ✅ Complete | No | HTML scraping (text search) | -| **Brave Search** | ✅ Complete | Yes | High-quality independent search | -| **SerpAPI** | ✅ Complete | Yes | Google, Bing, Yahoo via SerpAPI | -| **Tavily** | ✅ Complete | Yes | AI-powered search optimized for LLMs | -| **Exa** | ✅ Complete | Yes | Semantic search with embeddings | -| **SearXNG** | ✅ Complete | No | Self-hosted privacy-focused search | -| **ArXiv** | ✅ Complete | No | Academic papers and research | +| Provider | Status | API Key Required | Notes | +| ------------------------ | ----------- | ---------------- | ------------------------------------ | +| **Google Custom Search** | ✅ Complete | Yes | Requires API key + Search Engine ID | +| **DuckDuckGo** | ✅ Complete | No | HTML scraping (text search) | +| **Brave Search** | ✅ Complete | Yes | High-quality independent search | +| **SerpAPI** | ✅ Complete | Yes | Google, Bing, Yahoo via SerpAPI | +| **Tavily** | ✅ Complete | Yes | AI-powered search optimized for LLMs | +| **Exa** | ✅ Complete | Yes | Semantic search with embeddings | +| **SearXNG** | ✅ Complete | Yes | Self-hosted privacy-focused search | +| **ArXiv** | ✅ Complete | No | Academic papers and research | ## 🚀 Installation @@ -73,6 +77,7 @@ websearch "hello world" --max-results 1 ### Installation Options #### 🌟 Option 1: Direct Install (Recommended) + ```bash # Install from GitHub (gets you the latest features) cargo install --git https://github.com/xynehq/websearch.git @@ -82,6 +87,7 @@ websearch "rust programming" --provider duckduckgo --max-results 3 ``` #### 📦 Option 2: From Crates.io (Coming Soon) + ```bash # Install from crates.io (when published) cargo install websearch @@ -91,6 +97,7 @@ websearch --help ``` #### 🔧 Option 3: Development Install + ```bash # Clone and install from source git clone https://github.com/xynehq/websearch.git @@ -137,6 +144,7 @@ tokio = { version = "1.0", features = ["full"] }' >> Cargo.toml - **Network issues** → Try: `cargo install --git https://github.com/xynehq/websearch.git --offline` **Platform Support:** + - ✅ **Linux**: Works out of the box - ✅ **macOS**: Requires Xcode tools: `xcode-select --install` - ✅ **Windows**: Requires Visual Studio Build Tools @@ -193,18 +201,21 @@ async fn main() -> Result<(), Box> { ## 🎯 Why Use WebSearch? ### For CLI Users + - **🚀 Zero Setup**: Works immediately with DuckDuckGo (no API keys needed) - **🔄 Multiple Providers**: Switch between 8+ search engines with a simple flag - **📊 Rich Output**: Table, JSON, or simple text formats - **🎛️ Advanced Features**: Multi-provider search with aggregation strategies ### For Rust Developers + - **🦀 Native Performance**: Built with Rust for speed and safety - **🔧 Type Safety**: Full compile-time guarantees and error handling - **🔄 Provider Flexibility**: Easy to swap providers or use multiple simultaneously - **🛠️ Production Ready**: Async/await, comprehensive error handling, debug support ### For Both + - **🌐 8+ Search Providers**: Google, Tavily AI, ArXiv, DuckDuckGo, Brave, Exa, SerpAPI, SearXNG - **📈 Multi-Provider**: Aggregate results, failover, load balancing, race strategies - **🔒 Secure**: Environment-based API key management @@ -392,6 +403,7 @@ WebSearch provides a powerful CLI tool for searching from the command line with ### CLI Design Philosophy The CLI uses a simplified structure: + - **Default behavior**: `websearch "query"` searches using DuckDuckGo (no API key required) - **Single provider**: `websearch "query" --provider google` searches with a specific provider - **Multi-provider**: `websearch multi "query" --strategy aggregate` for advanced multi-provider searches @@ -468,10 +480,12 @@ websearch providers ### CLI Options #### Global Options + - `--help` - Show help information - `--version` - Show version information #### Default Search Options + - `--provider` - Search provider (google, tavily, exa, serpapi, duckduckgo, brave, searxng, arxiv) [default: duckduckgo] - `--max-results` - Maximum number of results [default: 10] - `--language` - Language code (e.g., en, es, fr) @@ -482,11 +496,13 @@ websearch providers - `--raw` - Show raw provider response #### ArXiv-Specific Options + - `--arxiv-ids` - Comma-separated ArXiv paper IDs (for ArXiv provider) - `--sort-by` - Sort by field (relevance, submitted-date, last-updated-date) - `--sort-order` - Sort order (ascending, descending) #### Multi Search Options (for `multi` subcommand) + - `--strategy` - Multi-provider strategy (aggregate, failover, load-balance, race) - `--providers` - Specific providers to use - `--stats` - Show provider performance statistics @@ -521,6 +537,7 @@ export SEARXNG_URL="https://your-searxng-instance.com" ### Output Formats #### Table Format (Default) + ``` Search Results from duckduckgo ──────────────────────────────────────────────────────────────────────────────── @@ -532,6 +549,7 @@ Search Results from duckduckgo ``` #### Simple Format + ``` 1. Rust Programming Language https://www.rust-lang.org/ @@ -539,6 +557,7 @@ Search Results from duckduckgo ``` #### JSON Format + ```json [ { @@ -658,6 +677,7 @@ cargo test --test tavily_integration_tests ``` **Test Coverage:** + - 29 unit tests covering core functionality - 13 integration tests for multi-provider scenarios - 15 Tavily-specific integration tests @@ -685,17 +705,20 @@ This Rust implementation was initially based on the excellent [PlustOrg/search-s ### Enhancements Over TypeScript Version **Performance Improvements:** + - **2-3x faster execution** with Rust's zero-cost abstractions - **Reduced memory footprint** (~80% less memory usage) - **Native async/await** with tokio for better concurrency **Additional Functionality:** + - **Multi-provider search strategies** (failover, load balancing, aggregation, race) - **Provider performance statistics** and monitoring - **Advanced error handling** with structured error types and exhaustive pattern matching - **Compile-time safety** preventing common runtime errors **Rust-Specific Benefits:** + - **Memory safety** without garbage collection overhead - **Thread safety** guaranteed at compile time - **Zero-cost abstractions** with no runtime performance penalty @@ -707,9 +730,9 @@ This Rust port maintains conceptual API compatibility with the TypeScript versio ```typescript // TypeScript version const results = await webSearch({ - query: 'rust programming', + query: "rust programming", maxResults: 5, - provider: googleProvider + provider: googleProvider, }); ``` @@ -737,6 +760,7 @@ echo 'websearch = "0.1.1"' >> Cargo.toml ``` **Perfect for:** + - 🏃‍♂️ **Quick searches** from the command line - 🔬 **Research projects** requiring academic papers (ArXiv) - 🤖 **AI applications** needing web data @@ -745,4 +769,4 @@ echo 'websearch = "0.1.1"' >> Cargo.toml --- -*This Rust implementation was initially based on [PlustOrg/search-sdk](https://github.com/PlustOrg/search-sdk) and has evolved to include additional features while maintaining API compatibility and leveraging Rust's performance and safety benefits.* +_This Rust implementation was initially based on [PlustOrg/search-sdk](https://github.com/PlustOrg/search-sdk) and has evolved to include additional features while maintaining API compatibility and leveraging Rust's performance and safety benefits._ diff --git a/src/providers/searxng.rs b/src/providers/searxng.rs index 80a7b00..483bb0a 100644 --- a/src/providers/searxng.rs +++ b/src/providers/searxng.rs @@ -1,14 +1,29 @@ -//! SearXNG provider - use crate::{ error::{SearchError, SearchResult}, types::{SearchOptions, SearchProvider, SearchResult as SearchResultType}, + utils::{debug, http::HttpClient}, }; +use serde::Deserialize; use std::collections::HashMap; +// 1. Define how SearXNG's JSON looks so Rust can read it +#[derive(Debug, Deserialize)] +struct SearxResponse { + results: Vec, +} + +#[derive(Debug, Deserialize)] +struct SearxResult { + url: Option, + title: Option, + content: Option, + pubdate: Option, +} + #[derive(Debug)] pub struct SearxNGProvider { base_url: String, + http_client: HttpClient, } impl SearxNGProvider { @@ -21,6 +36,7 @@ impl SearxNGProvider { Ok(Self { base_url: base_url.to_string(), + http_client: HttpClient::new(), }) } } @@ -31,10 +47,50 @@ impl SearchProvider for SearxNGProvider { "searxng" } - async fn search(&self, _options: &SearchOptions) -> SearchResult> { - Err(SearchError::ProviderError( - "SearXNG provider implementation coming soon".to_string(), - )) + async fn search(&self, options: &SearchOptions) -> SearchResult> { + // 2. Build the URL with query parameters (?q=...&format=json) + let mut params = HashMap::new(); + params.insert("q".to_string(), options.query.clone()); + params.insert("format".to_string(), "json".to_string()); + + // Use the utility function from your own http.rs + let base_search_url = format!("{}/search", self.base_url.trim_end_matches('/')); + let url = crate::utils::http::build_url(&base_search_url, params)?; + + debug::log_request( + &options.debug, + "SearXNG Search request", + &format!("url: {}", url), + ); + + // 3. Get the response as a plain String first + let response_text = self.http_client.get_text(&url).await?; + + // 4. Convert that String into our Rust structs (JSON parsing) + let searx_data: SearxResponse = serde_json::from_str(&response_text) + .map_err(|e| SearchError::ParseError(format!("SearXNG JSON error: {}", e)))?; + + // 5. Transform SearXNG results into the format the app expects + let mut results = Vec::new(); + let max_results = options.max_results.unwrap_or(10) as usize; + + for item in searx_data.results.into_iter().take(max_results) { + if let (Some(url), Some(title)) = (item.url, item.title) { + let domain = crate::utils::http::extract_domain(&url); + + results.push(SearchResultType { + url, + title: crate::utils::http::normalize_text(&title), + snippet: item.content.map(|c| crate::utils::http::normalize_text(&c)), + domain, + published_date: item.pubdate, + provider: Some("searxng".to_string()), + raw: None, + }); + } + } + + Ok(results) } fn config(&self) -> HashMap { @@ -43,3 +99,22 @@ impl SearchProvider for SearxNGProvider { config } } + + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_searxng_provider_new() { + let provider = SearxNGProvider::new("https://searx.be"); + assert!(provider.is_ok()); + assert_eq!(provider.unwrap().name(), "searxng"); + } + + #[test] + fn test_searxng_empty_url() { + let provider = SearxNGProvider::new(""); + assert!(provider.is_err()); + } +}