Comments
There's unfortunately not much to read here yet...
Follow the full discussion on Reddit.
Hi all. I've been playing around with a new optimizer that I've had some good success with. We know now that one of the main problems with Adam is that variance of adaptive learn rates in the early steps of optimization cause a sub-optimal trajectory, which ultimately leads to converging to a sub-optimal final minima. As it turns out this is the main reason why many people say that SGD is the golden standard for final performance.
There's unfortunately not much to read here yet...
Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.
Discover the best guides, books, papers and news in Machine Learning, once per week.