AutoScout24 Scraper

A professional, modular web scraper for monitoring new car listings from AutoScout24.

Features

Modular Architecture: Clean separation of concerns with dedicated modules for browser automation, scraping, database operations, and notifications
Automated Monitoring: Continuously monitors for new car listings
Smart Data Extraction: Extracts comprehensive car details including make, model, price, transmission, and more
Database Storage: SQLite database with automatic cleanup of old listings
Telegram Notifications: Optional real-time notifications for new listings and errors
User-Agent Rotation: Random User-Agent headers to avoid detection
Error Handling: Robust error handling with logging and notifications
Configurable: Easy configuration via environment variables

Project Structure

autoscoutscrapper/
├── src/
│   ├── __init__.py       # Package initialization
│   ├── config.py         # Configuration settings
│   ├── browser.py        # Playwright browser automation
│   ├── scraper.py        # Scraping logic
│   ├── database.py       # Database operations
│   ├── notifier.py       # Telegram notifications
│   └── utils.py          # Utility functions and logging
├── main.py               # Entry point
├── requirements.txt      # Python dependencies
├── .env.example         # Environment variables template
├── .gitignore           # Git ignore rules
└── README.md            # This file

Prerequisites

Python 3.8 or higher
pip (Python package manager)
Internet connection

Installation

Clone the repository:

git clone https://github.com/mxyldrm/autoscoutscrapper.git
cd autoscoutscrapper

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install chromium
```

Configure environment variables:

cp .env.example .env

Edit .env and add your Telegram credentials (optional):

TELEGRAM_API_KEY=your_bot_token_here
TELEGRAM_CHAT_ID=your_chat_id_here

Configuration

All configuration is handled in src/config.py and can be overridden with environment variables:

Core Settings

SCRAPE_INTERVAL: Time between scraping cycles (default: 60 seconds)
PAGES_TO_SCRAPE: List of pages to scrape (default: [1, 2])
BROWSER_HEADLESS: Run browser in headless mode (default: True)

Telegram Notifications (Optional)

To enable Telegram notifications:

Create a bot via @BotFather
Get your chat ID from @userinfobot
Set TELEGRAM_API_KEY and TELEGRAM_CHAT_ID in .env

Database

Default: SQLite database (autoscout.db)
Old listings are automatically deleted after 7 days
Supports PostgreSQL (set DATABASE_URL in .env)

Usage

Run the scraper:

python main.py

The scraper will:

Open AutoScout24 search page
Find the JSON API endpoint
Scrape car listings from multiple pages
Store new listings in the database
Send Telegram notifications for new cars (if configured)
Clean up old listings
Wait for the configured interval and repeat

Stopping the Scraper

Press Ctrl+C to gracefully stop the scraper.

Module Documentation

`src/config.py`

Contains all configuration settings including URLs, timeouts, User-Agents, and environment variables.

`src/browser.py`

Handles browser automation using Playwright to find the JSON API endpoint by monitoring network requests.

`src/scraper.py`

Main scraping logic that fetches and parses car listings from the JSON API.

`src/database.py`

Manages SQLite database operations including inserting, updating, and deleting car listings.

`src/notifier.py`

Sends Telegram notifications for new listings and errors.

`src/utils.py`

Utility functions for logging, price formatting, and feature extraction.

Logging

Logs are written to both:

Console (stdout)
File: autoscout_scraper.log

Log levels can be configured via LOG_LEVEL environment variable.

Development

Running Tests

# Install dev dependencies
pip install pytest

# Run tests (if implemented)
pytest

Code Formatting

# Install black
pip install black

# Format code
black src/ main.py

Troubleshooting

Browser Issues

If Playwright fails to launch:

playwright install chromium

Database Issues

If database errors occur, delete the database file and restart:

rm autoscout.db
python main.py

Network Issues

If scraping fails consistently:

Check your internet connection
Verify the AutoScout24 website is accessible
Check if the website structure has changed

Disclaimer

IMPORTANT: This project is strictly for educational purposes only.

Web scraping may violate the terms of service of websites
Users are solely responsible for how they use this code
The authors assume no liability for misuse or damages
Always respect robots.txt and website terms of service
Use reasonable scraping intervals to avoid overloading servers

Legal Notice

This tool is provided "as is" without warranty of any kind. Use at your own risk. Always comply with:

Website terms of service
Local laws and regulations regarding web scraping
Data protection regulations (GDPR, etc.)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Author

mxyldrm

Acknowledgments

Built with Playwright for browser automation
Uses Requests for HTTP requests
Database management with SQLite

Remember: This is an educational project. Always use web scraping responsibly and ethically.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AutoScout24 Scraper

Features

Project Structure

Prerequisites

Installation

Configuration

Core Settings

Telegram Notifications (Optional)

Database

Usage

Stopping the Scraper

Module Documentation

src/config.py

src/browser.py

src/scraper.py

src/database.py

src/notifier.py

src/utils.py

Logging

Development

Running Tests

Code Formatting

Troubleshooting

Browser Issues

Database Issues

Network Issues

Disclaimer

Legal Notice

License

Contributing

Author

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/config.py`

`src/browser.py`

`src/scraper.py`

`src/database.py`

`src/notifier.py`

`src/utils.py`

Packages