Implementation of MADGRAD optimization algorithm for Tensorflow

Follow the full discussion on Reddit.
I am pleased to present a Tensorflow implementation of the MADGRAD optimization algorithm, which was published by Facebook AI in their paper Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization (Aaron Defazio and Samy Jelassi, 2021). When this algorithm was first introduced, several people requested that it be implemented in tf.keras, so I decided to do so. This implementation's main features include:

Visit Website

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Implementation of MADGRAD optimization algorithm for Tensorflow

Comments

Discover the Best of Machine Learning.