All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Initial release of Financial News Scraper
- Multi-source news aggregation from 7 major sources
- Automated daily scraping via GitHub Actions
- Sentiment analysis using TextBlob
- Financial entity extraction (stocks, companies, persons)
- SQLite database with comprehensive schema
- Export functionality in 4 formats (JSON, CSV, XML, Parquet)
- RESTful API with FastAPI
- Comprehensive documentation suite
- Automated testing framework
- Data retention and cleanup policies
-
Scraping Module
- Async HTTP requests with aiohttp
- RSS feed parsing with feedparser
- Content extraction with trafilatura
- Fallback parsing with BeautifulSoup
- Duplicate detection via URL uniqueness
- Error handling and retry logic
-
Database
- SQLite with SQLAlchemy ORM
- Indexed queries for performance
- Automatic schema creation
- Data retention management
- Scraping log tracking
-
Export System
- JSON: Full structured data
- CSV: Flattened spreadsheet format
- XML: Hierarchical structure
- Parquet: Compressed columnar format
- Daily summary statistics
-
API
- Article retrieval with filtering
- Full-text search
- Statistics and aggregations
- Export endpoints
- Pagination support
-
Automation
- GitHub Actions workflow
- Daily scheduled runs (2:00 AM UTC)
- Manual trigger capability
- Artifact management
- Git-based version control
- Installation guide
- Usage guide with examples
- Complete API reference
- System architecture documentation
- Test results and metrics
- Contributing guidelines
- Code of conduct
- Scraping speed: 2.6 articles/second
- Success rate: 99.5%
- Export time: <5 seconds for all formats
- Memory efficient async processing
- Reuters RSS feed returns 404 (URL needs update)
- Yahoo Finance header length errors on some articles
- Seeking Alpha rate limiting (403 errors)
- Python 3.11+
- aiohttp 3.9.1
- feedparser 6.0.10
- trafilatura 1.6.0
- beautifulsoup4 4.12.2
- textblob 0.17.1
- sqlalchemy 2.0.23
- fastapi 0.104.1
- pandas 2.1.4
- pyarrow 14.0.1
- Real-time scraping with WebSockets
- Advanced ML models for entity extraction
- Multi-language support
- Cryptocurrency news sources
- Social media integration
- Data visualization dashboard
- PostgreSQL migration
- Caching layer
- GraphQL API
- Full-text search engine
- Comprehensive test suite
- Performance optimizations
- Major version (X.0.0): Breaking changes
- Minor version (0.X.0): New features, backward compatible
- Patch version (0.0.X): Bug fixes, backward compatible
- Update CHANGELOG.md
- Update version in setup.py
- Create git tag
- Push to GitHub
- Create GitHub release
- Deploy to production
- Current version: Full support
- Previous major version: Security fixes only
- Older versions: No support
For detailed commit history, see GitHub Commits