Understanding the RegexEntityExtractor in RASA

Published: 4 hours ago (January 19, 2026 at 07:00 AM EST)

1 min read

Source: Dev.to

Source: Dev.to

What is RegexEntityExtractor?

If the text matches a predefined pattern, it is extracted as an entity.
This makes the extractor:

Deterministic
Fast
Extremely precise (when patterns are well‑defined)

Why do we need it?

Use it for entities that:

Follow a fixed format
Are numerical or otherwise structured
Do not benefit from ML generalization

Typical examples:

Phone numbers
Email addresses
Order IDs
Dates
ZIP codes

Training a machine‑learning model to extract such patterns is often overkill.

YAML Configuration Example

version: "3.1"

nlu:
  - regex: phone_number
    examples: |
      - abc@gmail.com
      - xyz@gmail.com

Pipeline Configuration

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexEntityExtractor

Internal Working

Takes the raw user message.
Iterates over each regex pattern defined in the YAML file.
Applies the pattern to the text.
If a match is found:
- Extracts the matched substring.
- Assigns it as an entity.
- Stores start and end character indices.

Example

User message:

My phone number is 9876543210

Extracted entity:

{
  "entity": "phone_number",
  "value": "9876543210",
  "start": 19,
  "end": 29
}

Combining with Entity Synonym Mapper

RegexEntityExtractor extracts the raw entity.
EntitySynonymMapper normalizes it to a canonical value.

This combination provides:

High precision
Consistency
Clean downstream data

When Should RegexEntityExtractor Be Used?

When the entity format is predictable.
When precision matters more than recall.
When you want to reduce ML complexity.
When deterministic behavior is required.

Next Topic

We will explore CRFEntityExtractor, where entities are learned statistically rather than matched explicitly.

Related posts

ROS2 SYSTEMS ANALYSIS: Bringing Nodes To Life

The Workshop A robotics engineer walks into the knowledge wing of a robotics workshop. Along the far wall stretches a long series of shelves, each one labeled...

I RAN A STATIC LINTER ON 3.2 BILLION LINES OF LEGACY CODE (THE HUMAN GENOME)

!Cover image for I Ran a Static Linter on 3.2 Billion Lines of Legacy Code The Human Genomehttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,g...

How to Work Remotely: Simple Tips for Productivity and Success

1. Set Clear Expectations for Yourself - Work Hours: Stick to a set schedule to maintain a healthy work‑life balance. Try to start and finish work at the same...

Bridging a System-Level systemd Target to the User Instance

Overview When using systemd, many services depend on network-online.target to ensure the network is fully up before they start. network-online.target exists at...