Equilibrated adaptive learning rates for non-convex optimization
Source: Dev.to
Overview
Train deep learning models faster with a simple tweak: ESGD.
Many networks get stuck on flat stretches or saddle points that slow learning down, and plain step sizes can’t fix that. A smarter idea is to use adaptive learning rates that change for each part of the model, so slow parts speed up and fast parts calm down.
Old tricks sometimes react badly when the problem has both up and down directions, so they can even slow you down rather than help. Looking at how the surface bends gives a better clue, and an approach called equilibration balances those bends across the whole model.
From that comes a new method named ESGD, which adapts steps in a steadier way. In practice, ESGD learns as fast or faster than popular tools like RMSProp, and nearly always beats plain stochastic gradient descent.
If you want models that converge quicker, with less fiddling over step sizes, this is a simple change to try—many have seen faster, more stable training after switching, and you might too.
Read the comprehensive review:
Equilibrated adaptive learning rates for non-convex optimization