Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 37 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,41 +14,45 @@ A high-performance Rust library and command-line tool for searching across multi
## Features

### 🏗️ Dual Purpose Design

- **📚 Rust Library**: Integrate web search into your Rust applications
- **⚡ CLI Binary**: Ready-to-use command-line search tool
- **🔧 Single Installation**: One `cargo install` command gets you both

### 🔍 Search Capabilities

- **Multiple Providers**: Unified interface for 8+ search providers
- **Standardized Results**: Consistent result format across all providers
- **Multi-Provider Search**: Query multiple search engines simultaneously
- **Load Balancing**: Distribute requests across providers with failover support
- **Result Aggregation**: Combine and merge results from multiple providers

### 🦀 Rust-Powered Performance

- **High Performance**: Built with Rust for maximum speed and efficiency
- **Memory Safe**: Zero-cost abstractions with compile-time safety guarantees
- **Type Safe**: Full type safety with comprehensive error handling
- **Async/Await**: Modern async Rust for non-blocking operations

### 🛠️ Developer Experience

- **Simple CLI**: `websearch "your query"` - that's it!
- **Debug Support**: Configurable logging for development and debugging
- **Provider Statistics**: Track performance metrics for each search provider
- **Race Strategy**: Use fastest responding provider for optimal performance

## Supported Search Providers

| Provider | Status | API Key Required | Notes |
|----------|--------|------------------|-------|
| **Google Custom Search** | ✅ Complete | Yes | Requires API key + Search Engine ID |
| **DuckDuckGo** | ✅ Complete | No | HTML scraping (text search) |
| **Brave Search** | ✅ Complete | Yes | High-quality independent search |
| **SerpAPI** | ✅ Complete | Yes | Google, Bing, Yahoo via SerpAPI |
| **Tavily** | ✅ Complete | Yes | AI-powered search optimized for LLMs |
| **Exa** | ✅ Complete | Yes | Semantic search with embeddings |
| **SearXNG** | ✅ Complete | No | Self-hosted privacy-focused search |
| **ArXiv** | ✅ Complete | No | Academic papers and research |
| Provider | Status | API Key Required | Notes |
| ------------------------ | ----------- | ---------------- | ------------------------------------ |
| **Google Custom Search** | ✅ Complete | Yes | Requires API key + Search Engine ID |
| **DuckDuckGo** | ✅ Complete | No | HTML scraping (text search) |
| **Brave Search** | ✅ Complete | Yes | High-quality independent search |
| **SerpAPI** | ✅ Complete | Yes | Google, Bing, Yahoo via SerpAPI |
| **Tavily** | ✅ Complete | Yes | AI-powered search optimized for LLMs |
| **Exa** | ✅ Complete | Yes | Semantic search with embeddings |
| **SearXNG** | ✅ Complete | Yes | Self-hosted privacy-focused search |
| **ArXiv** | ✅ Complete | No | Academic papers and research |

## 🚀 Installation

Expand All @@ -73,6 +77,7 @@ websearch "hello world" --max-results 1
### Installation Options

#### 🌟 Option 1: Direct Install (Recommended)

```bash
# Install from GitHub (gets you the latest features)
cargo install --git https://github.com/xynehq/websearch.git
Expand All @@ -82,6 +87,7 @@ websearch "rust programming" --provider duckduckgo --max-results 3
```

#### 📦 Option 2: From Crates.io (Coming Soon)

```bash
# Install from crates.io (when published)
cargo install websearch
Expand All @@ -91,6 +97,7 @@ websearch --help
```

#### 🔧 Option 3: Development Install

```bash
# Clone and install from source
git clone https://github.com/xynehq/websearch.git
Expand Down Expand Up @@ -137,6 +144,7 @@ tokio = { version = "1.0", features = ["full"] }' >> Cargo.toml
- **Network issues** → Try: `cargo install --git https://github.com/xynehq/websearch.git --offline`

**Platform Support:**

- ✅ **Linux**: Works out of the box
- ✅ **macOS**: Requires Xcode tools: `xcode-select --install`
- ✅ **Windows**: Requires Visual Studio Build Tools
Expand Down Expand Up @@ -193,18 +201,21 @@ async fn main() -> Result<(), Box<dyn std::error::Error>> {
## 🎯 Why Use WebSearch?

### For CLI Users

- **🚀 Zero Setup**: Works immediately with DuckDuckGo (no API keys needed)
- **🔄 Multiple Providers**: Switch between 8+ search engines with a simple flag
- **📊 Rich Output**: Table, JSON, or simple text formats
- **🎛️ Advanced Features**: Multi-provider search with aggregation strategies

### For Rust Developers

- **🦀 Native Performance**: Built with Rust for speed and safety
- **🔧 Type Safety**: Full compile-time guarantees and error handling
- **🔄 Provider Flexibility**: Easy to swap providers or use multiple simultaneously
- **🛠️ Production Ready**: Async/await, comprehensive error handling, debug support

### For Both

- **🌐 8+ Search Providers**: Google, Tavily AI, ArXiv, DuckDuckGo, Brave, Exa, SerpAPI, SearXNG
- **📈 Multi-Provider**: Aggregate results, failover, load balancing, race strategies
- **🔒 Secure**: Environment-based API key management
Expand Down Expand Up @@ -392,6 +403,7 @@ WebSearch provides a powerful CLI tool for searching from the command line with
### CLI Design Philosophy

The CLI uses a simplified structure:

- **Default behavior**: `websearch "query"` searches using DuckDuckGo (no API key required)
- **Single provider**: `websearch "query" --provider google` searches with a specific provider
- **Multi-provider**: `websearch multi "query" --strategy aggregate` for advanced multi-provider searches
Expand Down Expand Up @@ -468,10 +480,12 @@ websearch providers
### CLI Options

#### Global Options

- `--help` - Show help information
- `--version` - Show version information

#### Default Search Options

- `--provider` - Search provider (google, tavily, exa, serpapi, duckduckgo, brave, searxng, arxiv) [default: duckduckgo]
- `--max-results` - Maximum number of results [default: 10]
- `--language` - Language code (e.g., en, es, fr)
Expand All @@ -482,11 +496,13 @@ websearch providers
- `--raw` - Show raw provider response

#### ArXiv-Specific Options

- `--arxiv-ids` - Comma-separated ArXiv paper IDs (for ArXiv provider)
- `--sort-by` - Sort by field (relevance, submitted-date, last-updated-date)
- `--sort-order` - Sort order (ascending, descending)

#### Multi Search Options (for `multi` subcommand)

- `--strategy` - Multi-provider strategy (aggregate, failover, load-balance, race)
- `--providers` - Specific providers to use
- `--stats` - Show provider performance statistics
Expand Down Expand Up @@ -521,6 +537,7 @@ export SEARXNG_URL="https://your-searxng-instance.com"
### Output Formats

#### Table Format (Default)

```
Search Results from duckduckgo
────────────────────────────────────────────────────────────────────────────────
Expand All @@ -532,13 +549,15 @@ Search Results from duckduckgo
```

#### Simple Format

```
1. Rust Programming Language
https://www.rust-lang.org/
Rust is a fast, reliable, and productive programming language...
```

#### JSON Format

```json
[
{
Expand Down Expand Up @@ -658,6 +677,7 @@ cargo test --test tavily_integration_tests
```

**Test Coverage:**

- 29 unit tests covering core functionality
- 13 integration tests for multi-provider scenarios
- 15 Tavily-specific integration tests
Expand Down Expand Up @@ -685,17 +705,20 @@ This Rust implementation was initially based on the excellent [PlustOrg/search-s
### Enhancements Over TypeScript Version

**Performance Improvements:**

- **2-3x faster execution** with Rust's zero-cost abstractions
- **Reduced memory footprint** (~80% less memory usage)
- **Native async/await** with tokio for better concurrency

**Additional Functionality:**

- **Multi-provider search strategies** (failover, load balancing, aggregation, race)
- **Provider performance statistics** and monitoring
- **Advanced error handling** with structured error types and exhaustive pattern matching
- **Compile-time safety** preventing common runtime errors

**Rust-Specific Benefits:**

- **Memory safety** without garbage collection overhead
- **Thread safety** guaranteed at compile time
- **Zero-cost abstractions** with no runtime performance penalty
Expand All @@ -707,9 +730,9 @@ This Rust port maintains conceptual API compatibility with the TypeScript versio
```typescript
// TypeScript version
const results = await webSearch({
query: 'rust programming',
query: "rust programming",
maxResults: 5,
provider: googleProvider
provider: googleProvider,
});
```

Expand Down Expand Up @@ -737,6 +760,7 @@ echo 'websearch = "0.1.1"' >> Cargo.toml
```

**Perfect for:**

- 🏃‍♂️ **Quick searches** from the command line
- 🔬 **Research projects** requiring academic papers (ArXiv)
- 🤖 **AI applications** needing web data
Expand All @@ -745,4 +769,4 @@ echo 'websearch = "0.1.1"' >> Cargo.toml

---

*This Rust implementation was initially based on [PlustOrg/search-sdk](https://github.com/PlustOrg/search-sdk) and has evolved to include additional features while maintaining API compatibility and leveraging Rust's performance and safety benefits.*
_This Rust implementation was initially based on [PlustOrg/search-sdk](https://github.com/PlustOrg/search-sdk) and has evolved to include additional features while maintaining API compatibility and leveraging Rust's performance and safety benefits._
87 changes: 81 additions & 6 deletions src/providers/searxng.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,29 @@
//! SearXNG provider

use crate::{
error::{SearchError, SearchResult},
types::{SearchOptions, SearchProvider, SearchResult as SearchResultType},
utils::{debug, http::HttpClient},
};
use serde::Deserialize;
use std::collections::HashMap;

// 1. Define how SearXNG's JSON looks so Rust can read it
#[derive(Debug, Deserialize)]
struct SearxResponse {
results: Vec<SearxResult>,
}

#[derive(Debug, Deserialize)]
struct SearxResult {
url: Option<String>,
title: Option<String>,
content: Option<String>,
pubdate: Option<String>,
}

#[derive(Debug)]
pub struct SearxNGProvider {
base_url: String,
http_client: HttpClient,
}

impl SearxNGProvider {
Expand All @@ -21,6 +36,7 @@ impl SearxNGProvider {

Ok(Self {
base_url: base_url.to_string(),
http_client: HttpClient::new(),
})
}
}
Expand All @@ -31,10 +47,50 @@ impl SearchProvider for SearxNGProvider {
"searxng"
}

async fn search(&self, _options: &SearchOptions) -> SearchResult<Vec<SearchResultType>> {
Err(SearchError::ProviderError(
"SearXNG provider implementation coming soon".to_string(),
))
async fn search(&self, options: &SearchOptions) -> SearchResult<Vec<SearchResultType>> {
// 2. Build the URL with query parameters (?q=...&format=json)
let mut params = HashMap::new();
params.insert("q".to_string(), options.query.clone());
params.insert("format".to_string(), "json".to_string());

// Use the utility function from your own http.rs
let base_search_url = format!("{}/search", self.base_url.trim_end_matches('/'));
let url = crate::utils::http::build_url(&base_search_url, params)?;

debug::log_request(
&options.debug,
"SearXNG Search request",
&format!("url: {}", url),
);

// 3. Get the response as a plain String first
let response_text = self.http_client.get_text(&url).await?;

// 4. Convert that String into our Rust structs (JSON parsing)
let searx_data: SearxResponse = serde_json::from_str(&response_text)
.map_err(|e| SearchError::ParseError(format!("SearXNG JSON error: {}", e)))?;

// 5. Transform SearXNG results into the format the app expects
let mut results = Vec::new();
let max_results = options.max_results.unwrap_or(10) as usize;

for item in searx_data.results.into_iter().take(max_results) {
if let (Some(url), Some(title)) = (item.url, item.title) {
let domain = crate::utils::http::extract_domain(&url);

results.push(SearchResultType {
url,
title: crate::utils::http::normalize_text(&title),
snippet: item.content.map(|c| crate::utils::http::normalize_text(&c)),
domain,
published_date: item.pubdate,
provider: Some("searxng".to_string()),
raw: None,
});
}
}

Ok(results)
}

fn config(&self) -> HashMap<String, String> {
Expand All @@ -43,3 +99,22 @@ impl SearchProvider for SearxNGProvider {
config
}
}


#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_searxng_provider_new() {
let provider = SearxNGProvider::new("https://searx.be");
assert!(provider.is_ok());
assert_eq!(provider.unwrap().name(), "searxng");
}

#[test]
fn test_searxng_empty_url() {
let provider = SearxNGProvider::new("");
assert!(provider.is_err());
}
}