What is the DIETClassifier?

Published: (February 7, 2026 at 08:29 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

What is the DIETClassifier?

DIET stands for Dual Intent and Entity Transformer.
It is a single neural network that performs:

  • Intent classification
  • Entity extraction

Unlike CRFEntityExtractor, which focuses only on entities, DIET jointly learns:

  • The meaning of the full sentence (intent)
  • The role of each token (entity labels)

This shared learning allows the model to use intent‑level context to improve entity prediction, and vice‑versa.


Why was DIET introduced?

Traditional pipelines looked like this:

  1. Intent classifier → predicts intent
  2. Entity extractor → predicts entities independently

Drawbacks of this separation:

  • Duplicate feature computation
  • No shared understanding between intent and entities
  • More models to train, tune, and maintain

DIET solves this by using one model to learn shared embeddings and optimise both tasks together, leading to better performance, especially when training data is limited.


How DIET works

DIET is based on a Transformer architecture. At a high level, it:

  1. Tokenizes the input text
  2. Converts tokens into embeddings
  3. Applies transformer layers to model context

and predicts:

  • Sentence embedding → intent
  • Token‑level labels → entities

Instead of hand‑engineered features (as in CRF), DIET learns features automatically.


Intent classification with DIET

For intent classification, DIET:

  • Embeds the entire sentence
  • Compares it against learned intent embeddings
  • Uses similarity scoring to choose the best intent

Example

“Book a flight to Paris.”

The model learns that this sentence embedding is closest to the book_flight intent, allowing DIET to generalize well to paraphrases and unseen phrasing.


Entity extraction with DIET

DIET performs token‑level classification, similar to CRF. Each token receives labels like B-entity, I-entity, O, etc.

Book    O
a       O
flight  O
from    O
New     B-location
York    I-location
to      O
Paris   B-location

The difference is that DIET uses contextual embeddings produced by transformers instead of manually designed features.


Training data format

DIET uses the same annotated NLU data as CRF.

version: "3.1"

nlu:
  - intent: book_flight
    examples: |
      - Book a flight from [New York](location) to [Paris](location)
      - Fly from [Berlin](location) to [London](location)

There is no separate configuration for intent vs. entity training; DIET learns both from the same data.


Internal working (simplified)

At runtime, DIET:

  1. Tokenizes the message
  2. Generates embeddings
  3. Applies transformer layers

Predicts:

  • Intent with confidence
  • Entity labels per token
  • Groups entity tokens

Example output

{
  "intent": {
    "name": "book_flight",
    "confidence": 0.92
  },
  "entities": [
    {
      "entity": "location",
      "value": "Paris",
      "start": 23,
      "end": 28
    }
  ]
}

When should you use DIETClassifier?

DIETClassifier is the default choice when you want a single model for intents and entities, especially when:

  • The language is flexible and conversational
  • You need long‑term scalability or are building production‑grade assistants

CRFEntityExtractor and RegexEntityExtractor still have value for highly structured or deterministic entities, but DIET is the backbone of modern Rasa NLU pipelines.

0 views
Back to Blog

Related posts

Read more »

Happy women in STEM day!! <3

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...