Equilibrated adaptive learning rates for non-convex optimization

Published: 2 months ago (February 15, 2026 at 07:10 PM EST)

1 min read

Source: Dev.to

Source: Dev.to

Overview

Train deep learning models faster with a simple tweak: ESGD.

Many networks get stuck on flat stretches or saddle points that slow learning down, and plain step sizes can’t fix that. A smarter idea is to use adaptive learning rates that change for each part of the model, so slow parts speed up and fast parts calm down.

Old tricks sometimes react badly when the problem has both up and down directions, so they can even slow you down rather than help. Looking at how the surface bends gives a better clue, and an approach called equilibration balances those bends across the whole model.

From that comes a new method named ESGD, which adapts steps in a steadier way. In practice, ESGD learns as fast or faster than popular tools like RMSProp, and nearly always beats plain stochastic gradient descent.

If you want models that converge quicker, with less fiddling over step sizes, this is a simple change to try—many have seen faster, more stable training after switching, and you might too.

Read the comprehensive review:
Equilibrated adaptive learning rates for non-convex optimization

Equilibrated adaptive learning rates for non-convex optimization

Overview

Related posts

GANs Explained Simply: The Two-Neural-Network Battle That Changed AI

Haar Cascades to YOLO: Face Detection Migration Guide

Image Classification with CNNs – Part 3: Understanding Max Pooling and Results

[Paper] Learning to Control: The iUzawa-Net for Nonsmooth Optimal Control of Linear PDEs