Understanding LSTMs – Part 4: How LSTM Decides What to Forget

Published: 3 days ago (February 25, 2026 at 03:46 PM EST)

2 min read

Source: Dev.to

In the previous article, we completed the first part of the LSTM and obtained the result from the calculation. Let us continue.

Forget Gate

When the input was 1, we obtained a certain result.
If we change the input to a relatively large negative number, such as −10, then after calculating the x-axis value, the output of the sigmoid activation function will be close to 0.

The long‑term memory will be completely forgotten, because anything multiplied by 0 is 0. Since the sigmoid activation function converts any input into a value between 0 and 1, its output determines what percentage of the long‑term memory is retained.

Thus, the first stage of the LSTM decides what percentage of the long‑term memory is remembered. This part is called the forget gate.

Second Stage

In the second stage, the block on the right combines the short‑term memory and the input to create a potential long‑term memory. The block on the left then determines what percentage of that potential memory should be added to the long‑term memory.

Let us plug in the numbers to see how a potential memory is created and how much of it is added to the long‑term memory.

We will continue exploring this in the next article.

Understanding LSTMs – Part 4: How LSTM Decides What to Forget

Forget Gate

Second Stage

Related posts

Understanding LSTMs – Part 5: The Input Gate Explained

How did we get here ? - From Rule-Based Systems to Agentic AI

Stop Asking if a Model Is Interpretable

The Gap Between Junior and Senior Data Scientists Isn’t Code