Skip to content

sevendaystoglory/training-massive-FMs

Repository files navigation

Training Massive FMs

What does it take to train massive foundation LLMs? (>1b)

We are essentially distilling trillions of bytes of information roughly into ~0.001 times the storage. This has to a rather sophisticated compression algortihm to retain the important aspects of digital data! What gets retained and what gets lost? I've had ample experience experience fine-tuning with and without adapters. That procedure, however, operates at a stage where the model has already absorbed basic language modelling capabilities such as grammar, typography, sentence structure, etc. This project will explore the fundamentals of pretraining, and training at scale in general. I will start by exploring methodologies that make training at scale possible.

[DISCLAIMER] Some projects in this repository such as MLA are not specifically used during training, but I thought they are still useful in scaling these models.

image

About

what does it take to train massive foundation models? (>1b) we are essentially distilling trillions of bytes of information roughly into ~0.001 times the storage. what gets retained and what (most) gets lost?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors