Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Follow the full discussion on Reddit.
I needed to finetune the GPT2 1.5 Billion parameter model for a project, but the model didn't fit on my gpu. So i figured out how to run it with deepspeed and gradient checkpointing, which reduces the required GPU memory. Now it can fit on just one GPU.

Visit Website

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Comments

Discover the Best of Machine Learning.