Welcome to the Universities Email Extractor & Processor! This is a collection of utilities to extract, combine, split, and process email data related to colleges and universities worldwide.
- Split Data: Separate combined CSV files into distinct
colleges_only.csvanduniversities_only.csvformats. - Combine Emails: Aggregate emails from multiple directories and deduplicate them to generate a master contact list.
- Refinement: Filter out specific sets of emails (e.g., separating US organizations from worldwide entries) to ensure unique, targeted lists.
split_college_university.js: Reads data from multiple CSV files, processes it based on "college" and "university" keywords, and outputs separated CSV files.combineEmails.js: Collects emails from multiple regional datasets, deduplicates them, and aggregates them into a final file.refine.js: Cross-references and deduplicates massive sets of university emails to build clean, filtered outputs.extractor_scripts/: Contains extraction-specific utility scripts.
Make sure you have Node.js installed on your machine.
Clone this repository to your local machine:
git clone https://github.com/ghoseyy/universities.git
cd universitiesTo run any of the scripts, simply use Node.js:
node split_college_university.js
node combineEmails.js
node refine.jsWe love contributions from the community! If you're interested in improving these scripts or adding new data sources, please check out our Contributing Guidelines.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is open-source. Feel free to use, modify, and distribute as you see fit!