Skip to content

Commit 0a91c23

Browse files
committed
feat: v0.6.0 — Bing engine, dynamic proxy pool, health monitor, HCL config
Core: - Add Bing search engine with query param extraction - Add HealthMonitor for engine health tracking and auto-disable - Add HCL configuration support (SearchConfig) - Add PooledHttpFetcher with per-request proxy rotation - Add ProxyProvider trait for external dynamic proxy sources - Add spawn_auto_refresh for background proxy pool refresh - Add AtomicBool-based proxy pool enabled toggle (thread-safe via Arc) - Refactor engines to use HtmlEngine base with query params SDKs: - Add dynamic proxy pool management (set_proxy_pool, set_proxy_pool_enabled, is_proxy_pool_enabled, proxy_pool_size) to both Node.js and Python SDKs - Add per-request proxy_pool option to search methods - Bump Node.js SDK to 0.6.0 with aarch64 platform support Tests: - Add proxy pool integration tests (7 tests) - 288 total tests passing, zero clippy warnings
1 parent dccc08f commit 0a91c23

35 files changed

Lines changed: 2979 additions & 1025 deletions

Cargo.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ name = "a3s-search"
3131
path = "src/main.rs"
3232

3333
[features]
34-
default = ["headless"]
34+
default = []
3535
headless = ["dep:chromiumoxide", "dep:which", "dep:zip"]
3636

3737
[dependencies]
@@ -63,8 +63,8 @@ scraper = "0.22"
6363
url = "2"
6464
urlencoding = "2"
6565

66-
# Regex
67-
regex = "1"
66+
# Configuration file parsing
67+
hcl-rs = "0.19"
6868

6969
# Headless browser (optional, for JS-rendered engines)
7070
chromiumoxide = { version = "0.7", features = ["tokio-runtime"], optional = true }

README.md

Lines changed: 137 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,14 @@ async fn main() -> anyhow::Result<()> {
6060
- **Async-First**: Built on Tokio for high-performance concurrent searches
6161
- **Timeout Handling**: Per-engine timeout with graceful degradation
6262
- **Extensible**: Easy to add custom search engines via the `Engine` trait
63-
- **Proxy Pool**: Dynamic proxy IP rotation to avoid anti-crawler blocking
63+
- **Dynamic Proxy Pool**: IP rotation with pluggable `ProxyProvider` trait and auto-refresh
64+
- **Health Monitor**: Automatic engine suspension after repeated failures with configurable recovery
65+
- **HCL Configuration**: Load engine and health settings from HCL config files
6466
- **Headless Browser**: Optional Chrome/Chromium integration for JS-rendered engines (feature-gated)
6567
- **Auto-Install Chrome**: Automatically detects or downloads Chrome for Testing when no browser is found
66-
- **PageFetcher Abstraction**: Pluggable page fetching (plain HTTP or headless browser)
68+
- **PageFetcher Abstraction**: Pluggable page fetching `HttpFetcher`, `PooledHttpFetcher`, or `BrowserFetcher`
6769
- **CLI Tool**: Command-line interface for quick searches
68-
- **Native SDKs**: TypeScript (NAPI) and Python (PyO3) bindings with async support
70+
- **Native SDKs**: TypeScript (NAPI) and Python (PyO3) bindings with async support and dynamic proxy pool management
6971

7072
## CLI Usage
7173

@@ -125,6 +127,7 @@ a3s-search engines
125127
|----------|--------|-------------|
126128
| `ddg` | DuckDuckGo | Privacy-focused search |
127129
| `brave` | Brave | Brave Search |
130+
| `bing` | Bing | Bing International |
128131
| `wiki` | Wikipedia | Wikipedia API |
129132
| `sogou` | Sogou | 搜狗搜索 |
130133
| `360` | 360 Search | 360搜索 |
@@ -140,6 +143,7 @@ a3s-search engines
140143
|--------|----------|-------------|
141144
| DuckDuckGo | `ddg` | Privacy-focused search |
142145
| Brave | `brave` | Brave Search |
146+
| Bing | `bing` | Bing International |
143147
| Wikipedia | `wiki` | Wikipedia API |
144148
| Google | `g` | Google Search (headless browser) |
145149

@@ -198,12 +202,24 @@ const response = await search.search('rust programming');
198202

199203
// With options
200204
const response = await search.search('rust programming', {
201-
engines: ['ddg', 'wiki', 'brave'],
205+
engines: ['ddg', 'wiki', 'brave', 'bing'],
202206
limit: 5,
203207
timeout: 15,
204208
proxy: 'http://127.0.0.1:8080',
205209
});
206210

211+
// Dynamic proxy pool (IP rotation)
212+
await search.setProxyPool([
213+
'http://10.0.0.1:8080',
214+
'http://10.0.0.2:8080',
215+
'socks5://10.0.0.3:1080',
216+
]);
217+
const response = await search.search('rust programming');
218+
219+
// Toggle proxy pool at runtime
220+
search.setProxyPoolEnabled(false); // direct connection
221+
search.setProxyPoolEnabled(true); // re-enable rotation
222+
207223
for (const r of response.results) {
208224
console.log(`${r.title}: ${r.url} (score: ${r.score})`);
209225
}
@@ -227,12 +243,24 @@ response = await search.search("rust programming")
227243

228244
# With options
229245
response = await search.search("rust programming",
230-
engines=["ddg", "wiki", "brave"],
246+
engines=["ddg", "wiki", "brave", "bing"],
231247
limit=5,
232248
timeout=15,
233249
proxy="http://127.0.0.1:8080",
234250
)
235251

252+
# Dynamic proxy pool (IP rotation)
253+
await search.set_proxy_pool([
254+
"http://10.0.0.1:8080",
255+
"http://10.0.0.2:8080",
256+
"socks5://10.0.0.3:1080",
257+
])
258+
response = await search.search("rust programming")
259+
260+
# Toggle proxy pool at runtime
261+
search.set_proxy_pool_enabled(False) # direct connection
262+
search.set_proxy_pool_enabled(True) # re-enable rotation
263+
236264
for r in response.results:
237265
print(f"{r.title}: {r.url} (score: {r.score})")
238266
print(f"{response.count} results in {response.duration_ms}ms")
@@ -246,6 +274,7 @@ Both SDKs support HTTP-based engines (no headless browser required):
246274
|----------|---------|--------|
247275
| `ddg` | `duckduckgo` | DuckDuckGo |
248276
| `brave` || Brave Search |
277+
| `bing` || Bing International |
249278
| `wiki` | `wikipedia` | Wikipedia API |
250279
| `sogou` || Sogou (搜狗) |
251280
| `360` | `so360` | 360 Search (360搜索) |
@@ -311,10 +340,10 @@ just cov-html
311340
### Running Tests
312341

313342
```bash
314-
# Default build (8 engines, 298 tests)
343+
# Default build (9 engines, 244+ lib tests)
315344
cargo test -p a3s-search --lib
316345

317-
# Without headless (5 engines)
346+
# Without headless (6 engines)
318347
cargo test -p a3s-search --no-default-features --lib
319348

320349
# Integration tests (requires network + Chrome for Google)
@@ -373,8 +402,9 @@ weight = engine_weight × num_engines_found
373402
└─────────────────────────────────────────────────────┘
374403
375404
PageFetcher (trait)
376-
├── HttpFetcher (reqwest, plain HTTP)
377-
└── BrowserFetcher (chromiumoxide, headless Chrome)
405+
├── HttpFetcher (reqwest, plain HTTP, single proxy)
406+
├── PooledHttpFetcher (reqwest, proxy pool rotation)
407+
└── BrowserFetcher (chromiumoxide, headless Chrome)
378408
└── BrowserPool (shared process, tab semaphore)
379409
```
380410

@@ -386,11 +416,11 @@ Add to your `Cargo.toml`:
386416

387417
```toml
388418
[dependencies]
389-
a3s-search = "0.5"
419+
a3s-search = "0.6"
390420
tokio = { version = "1", features = ["full"] }
391421

392422
# To disable headless browser support:
393-
# a3s-search = { version = "0.5", default-features = false }
423+
# a3s-search = { version = "0.6", default-features = false }
394424
```
395425

396426
### Basic Search
@@ -453,42 +483,51 @@ search.add_engine(wiki);
453483
### Using Proxy Pool (Anti-Crawler Protection)
454484

455485
```rust
456-
use a3s_search::{Search, SearchQuery, engines::DuckDuckGo};
486+
use std::sync::Arc;
487+
use a3s_search::{Search, SearchQuery, PooledHttpFetcher, PageFetcher};
488+
use a3s_search::engines::{DuckDuckGo, DuckDuckGoParser};
457489
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProtocol, ProxyStrategy};
458490

459491
// Create a proxy pool with multiple proxies
460-
let proxy_pool = ProxyPool::with_proxies(vec![
492+
let pool = Arc::new(ProxyPool::with_proxies(vec![
461493
ProxyConfig::new("proxy1.example.com", 8080),
462494
ProxyConfig::new("proxy2.example.com", 8080)
463495
.with_protocol(ProxyProtocol::Socks5),
464496
ProxyConfig::new("proxy3.example.com", 8080)
465497
.with_auth("username", "password"),
466-
]).with_strategy(ProxyStrategy::RoundRobin);
498+
]).with_strategy(ProxyStrategy::RoundRobin));
499+
500+
// PooledHttpFetcher rotates proxies per request
501+
let fetcher: Arc<dyn PageFetcher> = Arc::new(PooledHttpFetcher::new(Arc::clone(&pool)));
467502

468503
let mut search = Search::new();
469-
search.set_proxy_pool(proxy_pool);
470-
search.add_engine(DuckDuckGo::new());
504+
search.add_engine(DuckDuckGo::with_fetcher(DuckDuckGoParser, fetcher));
471505

472506
let query = SearchQuery::new("rust programming");
473507
let results = search.search(query).await?;
508+
509+
// Toggle proxy pool at runtime (thread-safe via AtomicBool)
510+
pool.set_enabled(false); // direct connection
511+
pool.set_enabled(true); // re-enable rotation
474512
```
475513

476514
### Dynamic Proxy Provider
477515

478516
```rust
479-
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProvider};
517+
use std::sync::Arc;
518+
use a3s_search::proxy::{ProxyPool, ProxyConfig, ProxyProvider, spawn_auto_refresh};
480519
use async_trait::async_trait;
481520
use std::time::Duration;
482521

483-
// Implement custom proxy provider (e.g., from API)
522+
// Implement custom proxy provider (e.g., from API, Redis, database)
484523
struct MyProxyProvider {
485524
api_url: String,
486525
}
487526

488527
#[async_trait]
489528
impl ProxyProvider for MyProxyProvider {
490529
async fn fetch_proxies(&self) -> a3s_search::Result<Vec<ProxyConfig>> {
491-
// Fetch proxies from your API
530+
// Fetch proxies from your API — format is up to you
492531
Ok(vec![
493532
ProxyConfig::new("dynamic-proxy.example.com", 8080),
494533
])
@@ -499,10 +538,12 @@ impl ProxyProvider for MyProxyProvider {
499538
}
500539
}
501540

502-
// Use with proxy pool
503-
let provider = MyProxyProvider { api_url: "https://api.example.com/proxies".into() };
504-
let proxy_pool = ProxyPool::with_provider(provider);
505-
proxy_pool.refresh().await?; // Initial fetch
541+
// Use with auto-refresh background task
542+
let pool = Arc::new(ProxyPool::with_provider(
543+
MyProxyProvider { api_url: "https://api.example.com/proxies".into() }
544+
));
545+
let _refresh_handle = spawn_auto_refresh(Arc::clone(&pool));
546+
// Pool now auto-refreshes every 60 seconds
506547
```
507548

508549
### Implementing Custom Engines
@@ -558,12 +599,11 @@ impl Engine for MySearchEngine {
558599
| Method | Description |
559600
|--------|-------------|
560601
| `new()` | Create a new search instance |
602+
| `with_health_config(config)` | Create with health monitoring |
561603
| `add_engine(engine)` | Add a search engine |
562604
| `set_timeout(duration)` | Set default search timeout |
563605
| `engine_count()` | Get number of configured engines |
564606
| `search(query)` | Perform a search |
565-
| `set_proxy_pool(pool)` | Set proxy pool for anti-crawler |
566-
| `proxy_pool()` | Get reference to proxy pool |
567607

568608
### SearchQuery
569609

@@ -647,14 +687,68 @@ pub trait Engine: Send + Sync {
647687
| `with_proxies(proxies)` | Create with static proxy list |
648688
| `with_provider(provider)` | Create with dynamic provider |
649689
| `with_strategy(strategy)` | Set selection strategy |
650-
| `set_enabled(bool)` | Enable/disable proxy pool |
690+
| `set_enabled(bool)` | Enable/disable proxy pool (thread-safe, `&self`) |
651691
| `is_enabled()` | Check if enabled |
652692
| `refresh()` | Refresh proxies from provider |
653693
| `get_proxy()` | Get next proxy (based on strategy) |
654694
| `add_proxy(proxy)` | Add a proxy to pool |
655695
| `remove_proxy(host, port)` | Remove a proxy |
696+
| `len()` | Number of proxies in pool |
656697
| `create_client(user_agent)` | Create HTTP client with proxy |
657698

699+
### PooledHttpFetcher
700+
701+
| Method | Description |
702+
|--------|-------------|
703+
| `new(pool)` | Create with `Arc<ProxyPool>` — rotates proxy per request |
704+
| `with_timeout(duration)` | Set request timeout (default: 30s) |
705+
706+
### spawn_auto_refresh
707+
708+
```rust
709+
pub fn spawn_auto_refresh(pool: Arc<ProxyPool>) -> tokio::task::JoinHandle<()>
710+
```
711+
712+
Spawns a background task that periodically calls `pool.refresh()` based on the provider's `refresh_interval()`. Returns a handle that can be aborted to stop refreshing.
713+
714+
### HealthMonitor / HealthConfig
715+
716+
| Field/Method | Description |
717+
|--------|-------------|
718+
| `HealthConfig { max_failures, suspend_duration }` | Configure failure threshold and suspension time |
719+
| `Search::with_health_config(config)` | Create search with health monitoring |
720+
721+
Engines are automatically suspended after `max_failures` consecutive failures and re-enabled after `suspend_duration`.
722+
723+
### SearchConfig (HCL)
724+
725+
| Method | Description |
726+
|--------|-------------|
727+
| `SearchConfig::load(path)` | Load config from `.hcl` file |
728+
| `SearchConfig::parse(content)` | Parse HCL string |
729+
| `health_config()` | Get `HealthConfig` from config |
730+
| `enabled_engines()` | Get list of enabled engine shortcuts |
731+
732+
Example HCL config:
733+
```hcl
734+
timeout = 10
735+
736+
health {
737+
max_failures = 5
738+
suspend_seconds = 120
739+
}
740+
741+
engine "ddg" {
742+
enabled = true
743+
weight = 1.0
744+
}
745+
746+
engine "bing" {
747+
enabled = true
748+
weight = 1.2
749+
}
750+
```
751+
658752
### ProxyConfig
659753

660754
| Method | Description |
@@ -763,16 +857,20 @@ search/
763857
├── query.rs # SearchQuery
764858
├── result.rs # SearchResult, SearchResults
765859
├── aggregator.rs # Result aggregation and ranking
766-
├── search.rs # Search orchestrator
767-
├── proxy.rs # Proxy pool and configuration
860+
├── search.rs # Search orchestrator with HealthMonitor
861+
├── config.rs # HCL configuration loading
862+
├── health.rs # HealthMonitor, HealthConfig
863+
├── proxy.rs # ProxyPool, ProxyProvider, spawn_auto_refresh
768864
├── fetcher.rs # PageFetcher trait, WaitStrategy
769-
├── fetcher_http.rs # HttpFetcher (reqwest wrapper)
865+
├── fetcher_http.rs # HttpFetcher + PooledHttpFetcher
866+
├── html_engine.rs # HtmlEngine<P> generic engine framework
770867
├── browser.rs # BrowserPool, BrowserFetcher (headless browser)
771868
├── browser_setup.rs # Chrome auto-detection and download
772869
└── engines/
773870
├── mod.rs # Engine exports
774871
├── duckduckgo.rs # DuckDuckGo
775872
├── brave.rs # Brave Search
873+
├── bing.rs # Bing International
776874
├── google.rs # Google (headless browser)
777875
├── wikipedia.rs # Wikipedia
778876
├── baidu.rs # Baidu (百度, headless browser)
@@ -816,17 +914,22 @@ A3S Search is a **utility component** of the A3S ecosystem.
816914
- [x] Consensus-based ranking algorithm
817915
- [x] Parallel async search execution
818916
- [x] Per-engine timeout handling
819-
- [x] 8 built-in engines (4 international + 4 Chinese)
917+
- [x] 9 built-in engines (5 international + 4 Chinese)
918+
- [x] Bing International engine (HTTP, no headless required)
820919
- [x] Headless browser support for JS-rendered engines (Google, Baidu, Bing China — enabled by default)
821-
- [x] PageFetcher abstraction (HttpFetcher + BrowserFetcher)
920+
- [x] PageFetcher abstraction (HttpFetcher + PooledHttpFetcher + BrowserFetcher)
822921
- [x] BrowserPool with tab concurrency control
823-
- [x] Proxy pool with dynamic provider support
922+
- [x] Dynamic proxy pool with pluggable `ProxyProvider` trait and `spawn_auto_refresh`
923+
- [x] `PooledHttpFetcher` for per-request proxy IP rotation
924+
- [x] Runtime proxy pool toggle via `AtomicBool` (`set_enabled(&self)`)
925+
- [x] Health monitoring with automatic engine suspension and recovery
926+
- [x] HCL configuration file loading for engines and health settings
824927
- [x] CLI tool with Homebrew distribution
825928
- [x] Automatic Chrome detection and download (Chrome for Testing)
826-
- [x] 298 comprehensive unit tests with 91.15% line coverage
827929
- [x] Proxy support for all engines via `-p` flag (HTTP/HTTPS/SOCKS5)
828930
- [x] UTF-8 safe content truncation for CJK/emoji
829-
- [x] Native SDKs: TypeScript (NAPI-RS) and Python (PyO3) with 103 tests
931+
- [x] Native SDKs: TypeScript (NAPI-RS) and Python (PyO3) with dynamic proxy pool management
932+
- [x] SDK proxy pool: `setProxyPool()`, `setProxyPoolEnabled()`, per-request `proxyPool` option
830933

831934
## License
832935

0 commit comments

Comments
 (0)