A professional, modular web scraper for monitoring new car listings from AutoScout24.
- Modular Architecture: Clean separation of concerns with dedicated modules for browser automation, scraping, database operations, and notifications
- Automated Monitoring: Continuously monitors for new car listings
- Smart Data Extraction: Extracts comprehensive car details including make, model, price, transmission, and more
- Database Storage: SQLite database with automatic cleanup of old listings
- Telegram Notifications: Optional real-time notifications for new listings and errors
- User-Agent Rotation: Random User-Agent headers to avoid detection
- Error Handling: Robust error handling with logging and notifications
- Configurable: Easy configuration via environment variables
autoscoutscrapper/
├── src/
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration settings
│ ├── browser.py # Playwright browser automation
│ ├── scraper.py # Scraping logic
│ ├── database.py # Database operations
│ ├── notifier.py # Telegram notifications
│ └── utils.py # Utility functions and logging
├── main.py # Entry point
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── .gitignore # Git ignore rules
└── README.md # This file
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection
-
Clone the repository:
git clone https://github.com/mxyldrm/autoscoutscrapper.git cd autoscoutscrapper -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Install Playwright browsers:
playwright install chromium
-
Configure environment variables:
cp .env.example .env
Edit
.envand add your Telegram credentials (optional):TELEGRAM_API_KEY=your_bot_token_here TELEGRAM_CHAT_ID=your_chat_id_here
All configuration is handled in src/config.py and can be overridden with environment variables:
SCRAPE_INTERVAL: Time between scraping cycles (default: 60 seconds)PAGES_TO_SCRAPE: List of pages to scrape (default: [1, 2])BROWSER_HEADLESS: Run browser in headless mode (default: True)
To enable Telegram notifications:
- Create a bot via @BotFather
- Get your chat ID from @userinfobot
- Set
TELEGRAM_API_KEYandTELEGRAM_CHAT_IDin.env
- Default: SQLite database (
autoscout.db) - Old listings are automatically deleted after 7 days
- Supports PostgreSQL (set
DATABASE_URLin.env)
Run the scraper:
python main.pyThe scraper will:
- Open AutoScout24 search page
- Find the JSON API endpoint
- Scrape car listings from multiple pages
- Store new listings in the database
- Send Telegram notifications for new cars (if configured)
- Clean up old listings
- Wait for the configured interval and repeat
Press Ctrl+C to gracefully stop the scraper.
Contains all configuration settings including URLs, timeouts, User-Agents, and environment variables.
Handles browser automation using Playwright to find the JSON API endpoint by monitoring network requests.
Main scraping logic that fetches and parses car listings from the JSON API.
Manages SQLite database operations including inserting, updating, and deleting car listings.
Sends Telegram notifications for new listings and errors.
Utility functions for logging, price formatting, and feature extraction.
Logs are written to both:
- Console (stdout)
- File:
autoscout_scraper.log
Log levels can be configured via LOG_LEVEL environment variable.
# Install dev dependencies
pip install pytest
# Run tests (if implemented)
pytest# Install black
pip install black
# Format code
black src/ main.pyIf Playwright fails to launch:
playwright install chromiumIf database errors occur, delete the database file and restart:
rm autoscout.db
python main.pyIf scraping fails consistently:
- Check your internet connection
- Verify the AutoScout24 website is accessible
- Check if the website structure has changed
IMPORTANT: This project is strictly for educational purposes only.
- Web scraping may violate the terms of service of websites
- Users are solely responsible for how they use this code
- The authors assume no liability for misuse or damages
- Always respect robots.txt and website terms of service
- Use reasonable scraping intervals to avoid overloading servers
This tool is provided "as is" without warranty of any kind. Use at your own risk. Always comply with:
- Website terms of service
- Local laws and regulations regarding web scraping
- Data protection regulations (GDPR, etc.)
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
mxyldrm
- Built with Playwright for browser automation
- Uses Requests for HTTP requests
- Database management with SQLite
Remember: This is an educational project. Always use web scraping responsibly and ethically.