A robust and scalable system for processing RSS/Atom feeds with webhook delivery capabilities.
- Queue-based feed processing with configurable size and priority
- Asynchronous API server with:
- Thread-safe operations
- Proper async/await support
- Graceful shutdown handling
- Enhanced error reporting
- Webhook delivery with:
- Robust retry mechanism with exponential backoff
- Configurable retry parameters (initial delay, max delay, backoff factor)
- Rate limiting and batch processing
- Circuit breaker pattern
- Comprehensive retry tracking and metrics
- Advanced Content Analysis:
- Intelligent content summarization
- Multi-stage content processing
- Quality assessment and validation
- Customizable content filters
- Advanced Performance Optimization:
- Dynamic batch sizing and thread management
- Intelligent resource allocation
- Real-time performance monitoring
- Adaptive processing parameters
- Code Quality and Documentation:
- Comprehensive module and method documentation
- PEP 8 compliant code style
- Full test coverage with pytest
- Automated code quality checks (black, isort, flake8)
- Metrics and monitoring:
- Prometheus integration
- Performance tracking
- Custom dashboards
- Resource utilization monitoring
- Webhook retry metrics
- Error handling:
- Centralized error definitions
- Automatic retry policies with exponential backoff
- Error categorization and tracking
- Detailed error reporting
- Modular architecture:
- Dedicated configuration management
- Pluggable queue implementations
- Extensible validation system
- SQLite Database Integration:
- Persistent storage of feed items
- Tag-based organization
- Efficient querying and retrieval
- Automatic schema management
- Batch processing support
- Real-time metrics monitoring with Prometheus integration
- Configurable webhook settings
- Thread-safe implementation
- Graceful shutdown handling
- Advanced error handling:
- Circuit breaker pattern for service protection
- Error tracking and metrics
- Configurable error history
- Sensitive data sanitization
- Comprehensive error categorization
- Google Drive Integration
- Automated folder structure creation
- Standardized content organization
- Metadata tracking
- Webhook Processing
- Rate-limited API endpoints
- Data validation
- Error handling
- Rate limiting implementation (0.2s)
- Retry mechanism with exponential backoff
- Delivery queue manager
- Circuit breaker implementation
- Basic error handling system
- Error tracking and logging
- Critical error alerts
- Performance monitoring
- Entity detection fixes
- Basic keyword extraction improvements
- Technology category identification
- Content validation enhancements
- Support for multiple languages (English, Spanish, French)
- Cross-lingual topic alignment
- Similarity scoring across languages
- Language-specific NLP processing pipelines
- Python 3.12+
- pip for package management
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt-
Configure Google Drive credentials:
- Create a project in Google Cloud Console
- Enable Google Drive API
- Create OAuth 2.0 credentials
- Save credentials to
.envfile (see.env.example)
-
Run tests:
pytestfeed_processor/
├── config/ # Configuration management
│ ├── webhook_config.py
│ └── processor_config.py
├── core/ # Core processing logic
│ ├── processor.py
│ └── database.py
├── queues/ # Queue implementations
│ ├── base.py
│ └── content.py
├── metrics/ # Metrics collection
│ ├── prometheus.py
│ └── performance.py
├── validation/ # Input validation
│ └── validators.py
├── webhook/ # Webhook handling
│ └── manager.py
└── errors.py # Error definitions
- Start the feed processor:
python -m feed_processor start- Process a specific feed:
python -m feed_processor process --feed-url https://example.com/feed.xml- View current metrics:
python -m feed_processor metricsCore settings are managed through environment variables or config files:
# Required settings
WEBHOOK_URL=https://api.example.com/webhook
WEBHOOK_TOKEN=your_token
MAX_QUEUE_SIZE=1000
# Optional settings
BATCH_SIZE=50
RETRY_COUNT=3
RATE_LIMIT=100
DB_PATH=feeds.dbFor detailed configuration options, see config/ directory.
Run the test suite:
pytestFor integration tests:
pytest tests/integrationThe system exposes metrics via Prometheus at /metrics. Available metrics include:
- Queue size and processing rates
- Webhook delivery statistics
- Error rates and types
- Processing latency
Configuration is managed through dedicated modules in the config/ directory:
webhook_config.py: Webhook delivery settingsprocessor_config.py: Core processor settings
The system includes a dedicated pipeline for fetching content from Inoreader and storing it in Airtable. This pipeline:
- Fetches content from Inoreader using their API
- Processes and validates the content
- Stores the processed content in Airtable
- Includes comprehensive metrics and monitoring
To run the pipeline:
- Copy the example environment file and fill in your credentials:
cp .env.example .env
# Edit .env with your Inoreader and Airtable credentials- Run the pipeline:
python run_pipeline.pyThe pipeline will:
- Start a Prometheus metrics server (default port: 9090)
- Continuously fetch new content from Inoreader
- Process content in configurable batch sizes
- Store processed content in Airtable
- Provide real-time metrics and monitoring
Configuration options (via environment variables):
BATCH_SIZE: Number of items to process in each batch (default: 50)FETCH_INTERVAL: Time in seconds between fetch operations (default: 60.0)METRICS_PORT: Port for Prometheus metrics server (default: 9090)DB_PATH: Path to SQLite database file (default: feeds.db)
Monitor the pipeline using:
- Prometheus metrics at http://localhost:9090
- Structured logs in JSON format
- Airtable dashboard for stored content
- All new features must have corresponding test cases
- Tests should be written before implementation
- Current test coverage: ~85%
- Follow PEP 8 style guide
- Documentation required for all new features
- Code review required for all PRs
- Regular dependency updates
- Optimization Overview
- Quick Start Guide
- System Diagrams
- Advanced Diagrams
- Animated Workflows
- Metrics Visualization
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and ensure they pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.