Image Classification with Convolutional Neural Networks – Part 1: Why We Need CNNs

Published: 3 days ago (February 9, 2026 at 02:43 PM EST)

2 min read

Source: Dev.to

Why We Need CNNs

In this article, we will explore image classification using convolutional neural networks.
For this, we will use a simple example: X or an O. We will start with the image of the letter O, represented as a 6 × 6 pixel grid.

It is possible to build a conventional (fully‑connected) neural network that can correctly classify this image. The typical approach is to flatten the 6 × 6 grid into a single column of 36 input nodes and connect these inputs to a hidden layer. For a single hidden node, this requires 36 connections, each with its own weight that must be learned via back‑propagation.

Usually, the first hidden layer contains more than one node, and each additional node adds another 36 weights to estimate. As the number of pixels grows, the number of weights grows proportionally, quickly becoming impractical for larger, more complex images.

Because of this scalability issue, the classification of large and complicated images is usually performed with convolutional neural networks (CNNs). A CNN makes the problem tractable by:

Reducing the number of input parameters through weight sharing and local receptive fields.
Providing tolerance to small shifts in pixel locations (translation invariance).
Exploiting spatial correlations that naturally occur in images.

Returning to our O detection example, a CNN can learn local patterns (e.g., edges and curves) and combine them hierarchically to recognize the whole shape, using far fewer parameters than a fully‑connected network.

We will explore how a convolutional neural network can help with this task in the next article.

Image Classification with Convolutional Neural Networks – Part 1: Why We Need CNNs

Why We Need CNNs

Related posts

Image Classification with CNNs – Part 3: Understanding Max Pooling and Results

AI in Multiple GPUs: Understanding the Host and Device Paradigm

[Paper] SurfPhase: 3D Interfacial Dynamics in Two-Phase Flows from Sparse Videos

[Paper] Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling