Training Google's Reformer - takeaways, code, and weights

Follow the full discussion on Reddit.
I used Google's Trax library to train a Reformer model with 65k context length on Wikipedia. This post includes the code, takeaways, weights, and samples from the model as well as a repo and Colab for others to finetune it.


There's unfortunately not much to read here yet...

