UltimateAdam for Tensorflow - transition from SGD to Adam

Follow the full discussion on Reddit.
Hi all. I've been playing around with a new optimizer that I've had some good success with. We know now that one of the main problems with Adam is that variance of adaptive learn rates in the early steps of optimization cause a sub-optimal trajectory, which ultimately leads to converging to a sub-optimal final minima. As it turns out this is the main reason why many people say that SGD is the golden standard for final performance.

Comments

There's unfortunately not much to read here yet...

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Join over 900 Machine Learning Engineers receiving our weekly digest.

Best of Machine LearningBest of Machine Learning

Discover the best guides, books, papers and news in Machine Learning, once per week.

Twitter