Ridge Regression vs Lasso Regression

Published: 3 months ago (February 3, 2026 at 03:02 PM EST)

5 min read

Source: Dev.to

Source: Dev.to

Introduction

Linear regression is one of the most fundamental tools in a data scientist’s toolkit. At its core lies Ordinary Least Squares (OLS), a method that estimates model parameters by minimizing the sum of squared differences between predicted and actual values.

In many real‑world problems—such as house‑price prediction—datasets often contain many features, correlated variables, and noisy inputs. In such cases, traditional OLS regression becomes unstable and prone to over‑fitting. To address these challenges, regularisation techniques are used. The two most important regularisation‑based models are:

Ridge Regression (L2 regularisation)
Lasso Regression (L1 regularisation)

Ordinary Least Squares (OLS)

OLS estimates model parameters by minimising the sum of squared residuals between predicted and actual values:

[ \text{Loss}{\text{OLS}} = \sum{i=1}^{n} (y_i - \hat{y}_i)^2 ]

where (\hat{y}_i) represents the predicted price for observation (i).

OLS works well for small, clean datasets, but it struggles when:

There are many features
Features are highly correlated (multicollinearity)
Data contains noise

These situations lead to over‑fitting: the model performs well on training data but poorly on unseen data.

Regularisation in Linear Regression

Regularisation adds a penalty term to the loss function, charging the model for complexity. The model must now balance accuracy against simplicity rather than merely minimising error.

[ \text{Loss} = \text{Error} + \text{Penalty} ]

Large coefficients are discouraged, which typically yields models that generalise better to new data.

Ridge Regression (L2 Regularisation)

Ridge regression modifies the OLS loss function by adding an L2 penalty proportional to the sum of squared coefficients.

[ \text{Loss}{\text{Ridge}} = \underbrace{\sum{i=1}^{n}(y_i - \hat{y}i)^2}{\text{RSS}} ;+; \lambda \underbrace{\sum_{j=1}^{p}\beta_j^{2}}_{\text{L2 penalty}} ]

(\lambda \ge 0) is the regularisation parameter.
The intercept (\beta_0) is not penalised.

Conceptual Effect

Shrinks coefficients smoothly
Reduces model variance
Keeps all features
Handles multicollinearity well

Key Property

Ridge does not perform feature selection; coefficients are reduced but never become exactly zero.

Python Example

from sklearn.linear_model import Ridge

ridge = Ridge(alpha=1.0)          # alpha == λ
ridge.fit(X_train_scaled, y_train)

y_pred_ridge = ridge.predict(X_test_scaled)

Lasso Regression (L1 Regularisation)

Lasso adds an L1 penalty, which is the sum of the absolute values of the coefficients.

[ \text{Loss}{\text{Lasso}} = \underbrace{\sum{i=1}^{n}(y_i - \hat{y}i)^2}{\text{RSS}} ;+; \lambda \underbrace{\sum_{j=1}^{p}|\beta_j|}_{\text{L1 penalty}} ]

(\lambda) controls the strength of regularisation.

Conceptual Effect

Creates sparse models
Forces some coefficients to be exactly zero
Automatically removes weak features

Key Property

Lasso performs feature selection, producing simpler and more interpretable models.

Python Example

from sklearn.linear_model import Lasso

lasso = Lasso(alpha=0.1)          # alpha == λ
lasso.fit(X_train_scaled, y_train)

y_pred_lasso = lasso.predict(X_test_scaled)

Comparing Ridge and Lasso

Aspect	Ridge	Lasso
Feature selection	Retains all features (coefficients are shrunken)	Sets some coefficients to zero → automatic selection
Behaviour with correlated features	Distributes weight smoothly among correlated predictors	Picks one predictor, zeroes out the others
Interpretability	“Price depends on all 10 factors with varying importance.”	“Price primarily depends on size, location, and age; other factors don’t matter.”

Example with two correlated predictors (size and number of rooms, (r = 0.85)):

Ridge: Size = $120/sq ft, Rooms = $8,000/room (both retained)
Lasso: Size = $180/sq ft, Rooms = $0 (chooses one, drops the other)

Application Scenario: House‑Price Prediction

Assume the dataset contains:

House size
Number of bedrooms
Distance to the city centre
Number of nearby schools
Several noisy or weak features

When to use Ridge

Most features are expected to influence price
Multicollinearity is present
You need stable predictions

When to use Lasso

Only a few features truly matter
Many variables add noise
Model interpretability is important

Python Implementation

Data Preparation

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_squared_error

# Assume df is a pandas DataFrame containing the data
X = df[['size', 'bedrooms', 'distance_city', 'schools_nearby', 'noise_feature']]
y = df['price']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

OLS Model

ols = LinearRegression()
ols.fit(X_train_scaled, y_train)

y_pred_ols = ols.predict(X_test_scaled)
mse_ols = mean_squared_error(y_test, y_pred_ols)
print(f'OLS MSE: {mse_ols:.2f}')

Ridge Model

ridge = Ridge(alpha=1.0)
ridge.fit(X_train_scaled, y_train)

y_pred_ridge = ridge.predict(X_test_scaled)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print(f'Ridge MSE: {mse_ridge:.2f}')

Lasso Model

lasso = Lasso(alpha=0.1)
lasso.fit(X_train_scaled, y_train)

y_pred_lasso = lasso.predict(X_test_scaled)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
print(f'Lasso MSE: {mse_lasso:.2f}')

Choosing the Right Model for House Prices

Ridge Regression – preferred when all features contribute meaningfully (e.g., size, bedrooms, schools, distance).
Lasso Regression – more suitable when only a few features are truly important and the rest add noise, thanks to its built‑in feature‑selection capability.

Model Evaluation and Overfitting Detection

Overfitting can be detected by comparing training and testing performance:

High training score but low test score → overfitting.
Similar training and test scores → good generalization.

Residual analysis also plays a key role. Residuals should be randomly distributed; visible patterns may indicate missing variables or non‑linear relationships.

Conclusion

OLS is simple but prone to overfitting in complex datasets.
Ridge and Lasso regression introduce regularization to improve stability and generalization.
- Ridge is best when all features matter.
- Lasso is preferred for sparse, interpretable models.

Understanding when and how to apply these techniques is essential for both exams and real‑world machine‑learning problems.

Ridge Regression vs Lasso Regression

Introduction

Ordinary Least Squares (OLS)

Regularisation in Linear Regression

Ridge Regression (L2 Regularisation)

Conceptual Effect

Key Property

Python Example

Lasso Regression (L1 Regularisation)

Conceptual Effect

Key Property

Python Example

Comparing Ridge and Lasso

Application Scenario: House‑Price Prediction

When to use Ridge

When to use Lasso

Python Implementation

Data Preparation

OLS Model

Ridge Model

Lasso Model

Choosing the Right Model for House Prices

Model Evaluation and Overfitting Detection

Conclusion

Related posts

Your AI Agent Just Got a Credit Card: Introducing x402 Bazaar

Smartfind.ai

Building a Jedi-Style Hand Gesture Interface with TensorFlow.js: Control Your Browser Without Touching Anything

How to Sync AI Skills Across Claude Code, OpenClaw, and Codex in 2 Minutes