nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch

Follow the full discussion on Reddit.
We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).

Comments

There's unfortunately not much to read here yet...

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Join over 900 Machine Learning Engineers receiving our weekly digest.

Best of Machine LearningBest of Machine Learning

Discover the best guides, books, papers and news in Machine Learning, once per week.

Twitter