RM - RapMachine

Use

Run python BotScript.py for the Tweetbot (This script runs only with GPT2-rap-recommended)
Run python src/test_generation.py with proper parameters to test language generation.
To quit the program, use CTRL + C

Installation

Install requirements pip install -r requirements.txt
Alternatively run ./install.sh

Prerequisites

Apply for a Twitter Developer Account with elevated access
Create an .env file including the variables:
- CONSUMER_API_KEY
- CONSUMER_API_KEY_SECRET
- ACCESS_TOKEN
- ACCESS_TOKEN_SECRET and provide the necessary credentials to each variable.
Download fasttext's language identification model and place it in the same folder as this file.
Create a folder called .model in the same folder as this file and place the proper finetuned GPT-2 model (see Models section) inside it (.model/GPT2-rap-recommended/config.json pytorch...). The model is available here
Hardware that can deal with GPT-2.

Data Documentation

We gathered raps from genius.com, ohhla.com and battlerap.com. For genius.com, we used the official API (GeniusLyrics and GetRankings repos) while genius.com and ohhla.com were scraped using a specifically tailored scrapy scraper. In total we gathered ~70k raps which we used for finetuning. GPT-2 was finetuned by creating one large text, while T5 was finetuned on prompts. The prompts had the form of KEYWORDS: <keywords> RAP-LYRICS: <rap text> which proved to be insufficient for our task. Eventually we chosed to use the fine-tuned GPT2 model. Experimental and succeeding scripts can be found in ./preprocessing/finetunging. Additionaly, a RoBERTa model was finetuned on both data from the english wikipedia, tweets regarding hate speech, the CNN/Dailymail dataset and 4k rap lyrics data (data can be found under Data) to classify the quality of the generated raps.

`preprocessing`

finetuning
- FineTuneRapMachineExp.ipynb Experimental script
- FineTuneRapMachineGPT2.ipynb GPT2 finetuning script
- T5.ipynb Finetuning Script for T5 on a key2text approach
- keytotext.ipynb Using the keytotext library for finetuning
- FineTuneRapMachineExp2.ipynb Another experimental script, in which GPT-J and GPT-NEO were used, yet didn't succeed
data_analysis
- CreateAdvData.ipynb Script to create balanced dataset to train the ranker model
- LyricsAnalyye.ipzng Script to analyze the scraped data.
lyrics_spider
- Includes scrapy program to obtain lyrics
cleaning_and_keywords
- data_cleaner Script for removing noise from 70k scraped rap corpus
- kw_extraction Script that starts building a TF-IDF model either from scratch or from an existing model to generate keywords for rap corpus
- tf_idf TF-IDF model script
ranker
- roberta_ranker.ipynb Roberta finetuning script

Sources

ohhla.com - Scraped
BattleRap.com - Scraped
Genius.com - Accessed through API, GeniusLyrics and GetRankings used.

Genius API

To obtain lyrics from genius.com, two programs were implemented which are based on different, yet outdated, repositories.
- GeniusLyrics
- GetRankings
Both programs are part of this project

Models

GPT2-rap-recommended Download (Necessary to use BotScript.py)
GPT2-small-key2text Download (Approach did not work out, trained on 4k corpus)
Roberta Ranker Download (Ranker trained on 8k data with 4k rap corpus and 4k non-rap corpus)
T5-large-key2text Download (Approach did not work out, trained on 70k corpus)
T5-small-key2text Download (Approach did not work out, trained on 4k corpus)
tf-idf pickle Download (Approach did not work out, trained on 70k corpus)

Data

Our data can be downloaded here

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
GeniusLyrics @ 9a48812		GeniusLyrics @ 9a48812
GetRankings @ 33a1a9e		GetRankings @ 33a1a9e
preprocessing		preprocessing
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
OffWords.txt		OffWords.txt
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RM - RapMachine

Use

Installation

Prerequisites

Data Documentation

`preprocessing`

Sources

Genius API

Models

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RM - RapMachine

Use

Installation

Prerequisites

Data Documentation

preprocessing

Sources

Genius API

Models

Data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`preprocessing`

Packages