Skip to content

ghoseyy/universities

Repository files navigation

Universities Email Extractor & Processor

Welcome to the Universities Email Extractor & Processor! This is a collection of utilities to extract, combine, split, and process email data related to colleges and universities worldwide.

🚀 Features

  • Split Data: Separate combined CSV files into distinct colleges_only.csv and universities_only.csv formats.
  • Combine Emails: Aggregate emails from multiple directories and deduplicate them to generate a master contact list.
  • Refinement: Filter out specific sets of emails (e.g., separating US organizations from worldwide entries) to ensure unique, targeted lists.

📂 Project Structure

  • split_college_university.js: Reads data from multiple CSV files, processes it based on "college" and "university" keywords, and outputs separated CSV files.
  • combineEmails.js: Collects emails from multiple regional datasets, deduplicates them, and aggregates them into a final file.
  • refine.js: Cross-references and deduplicates massive sets of university emails to build clean, filtered outputs.
  • extractor_scripts/: Contains extraction-specific utility scripts.

🛠️ Getting Started

Prerequisites

Make sure you have Node.js installed on your machine.

Installation

Clone this repository to your local machine:

git clone https://github.com/ghoseyy/universities.git
cd universities

Usage

To run any of the scripts, simply use Node.js:

node split_college_university.js
node combineEmails.js
node refine.js

🤝 Contributing

We love contributions from the community! If you're interested in improving these scripts or adding new data sources, please check out our Contributing Guidelines.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

🛡️ License

This project is open-source. Feel free to use, modify, and distribute as you see fit!

About

A powerful set of data extraction and processing utilities for filtering, deduplicating, and mapping global university and college email datasets.

Topics

Resources

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors