Skip to content

jaulie/RNN_text_generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RNN Decoder Model with TinyStories Corpus

This project builds & trains a bigram decoder model using the TinyStories corpus for text generation. The TinyStories corpus can be found here. This corpus contains stories synthetically generated by Chat-GPT using a limited vocabulary. The authors of the paper introducing this corpus explain in their paper their goal to build a smaller generative model which still outputs realistic text like a large language model. TinyStories is a synthetic dataset built using a large language model like GPT-3.5. TinyStories can be used to train language models that are much smaller than state-of-the-art models. The paper is available here.

The traditional transformer architecture consists of an encoder and a decoder model. In an encoder model, input text is vectorized before being passed through various transformation layers, outputting a word embedding. Encoders are responsible for extracting relevant information from the input text which is then passed to the decoder. The decoder interprets a result based on the continuous representation passed from the encoder. The decoder converts essentially works the other way around, taking the output from the encoder, and converting it into its corresponding input. For a text generative model, the decoder would take an embedding and output a word or character.

Since the seminal paper by Vaswani et al., which introduced the transformer model, many encoder- and decoder-only transformer architectures have become available. For example, BERT is pre-trained using masked language modeling and next-sentence prediction tasks. Since the purpose of BERT (and later RoBERTa) is to perform tasks like sentence classification and question answering that require understanding the input text but not generating new text sequences, its design only incorporates an encoder. Encoder architectures are useful when tasks require only understanding. Because of this narrower focus, these models can become more efficient and specialized. GPT is a decoder-only model because it is based on the decoder part of the original Transformer architecture; its focus is to generate text. Decoder models are autoregressive, in that they generate output one part at a time, using their previous outputs as input for their next steps. As a result, GPT's architecture is tailored for these kinds of autoregressive applications.

In this project, based on this tutorial from Tensorflow, the recurrent neural network decoder model generates text by predicting the next character, given the previous internal state of the model. It's next predicted character is only dependent on the previous internal state, or the previously predicted character, so it is also a bigram model.

About

Generates text based off tiny stories corpus using a Recurrent Neural Network based decoder model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors