This project is an optimized email scraper written in Python. It crawls through websites starting from a user-provided URL and extracts email addresses. The script has been refactored for improved efficiency, modularity, and maintainability.
- Modular Design: Separated logic into functions for clarity and reusability.
- Efficient URL Handling: Uses
urllib.parse.urljoinfor robust URL building. - Robust Error Handling: Manages connection errors and unexpected interruptions.
- Command-Line Interface: Simple prompts for user inputs.
- Python 3.x
- The following Python packages (see
requirements.txt):beautifulsoup4requestspyfiglet
-
Clone the repository:
git clone https://github.com/Irfan3006/emailscraper.git cd emailscraper -
Install the required packages:
pip3 install -r requirements.txt
To run the email scraper, execute the script:
python3 emailscraper.pyYou will be prompted to enter:
- The starting URL.
- The number of pages to crawl.
The program will then display the emails it finds during the crawl.
Contributions are welcome! Feel free to fork the repository and submit pull requests. For major changes, please open an issue first to discuss what you would like to change.
Distributed under the MIT License. See the LICENSE file for more information.
Feel free to modify this tutorial to suit your project and add any additional details you find necessary!
