Understanding the RegexEntityExtractor in RASA

Published: (January 19, 2026 at 07:00 AM EST)
1 min read
Source: Dev.to

Source: Dev.to

What is RegexEntityExtractor?

If the text matches a predefined pattern, it is extracted as an entity.
This makes the extractor:

  • Deterministic
  • Fast
  • Extremely precise (when patterns are well‑defined)

Why do we need it?

Use it for entities that:

  • Follow a fixed format
  • Are numerical or otherwise structured
  • Do not benefit from ML generalization

Typical examples:

  • Phone numbers
  • Email addresses
  • Order IDs
  • Dates
  • ZIP codes

Training a machine‑learning model to extract such patterns is often overkill.

YAML Configuration Example

version: "3.1"

nlu:
  - regex: phone_number
    examples: |
      - abc@gmail.com
      - xyz@gmail.com

Pipeline Configuration

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexEntityExtractor

Internal Working

  1. Takes the raw user message.
  2. Iterates over each regex pattern defined in the YAML file.
  3. Applies the pattern to the text.
  4. If a match is found:
    • Extracts the matched substring.
    • Assigns it as an entity.
    • Stores start and end character indices.

Example

User message:

My phone number is 9876543210

Extracted entity:

{
  "entity": "phone_number",
  "value": "9876543210",
  "start": 19,
  "end": 29
}

Combining with Entity Synonym Mapper

  • RegexEntityExtractor extracts the raw entity.
  • EntitySynonymMapper normalizes it to a canonical value.

This combination provides:

  • High precision
  • Consistency
  • Clean downstream data

When Should RegexEntityExtractor Be Used?

  • When the entity format is predictable.
  • When precision matters more than recall.
  • When you want to reduce ML complexity.
  • When deterministic behavior is required.

Next Topic

We will explore CRFEntityExtractor, where entities are learned statistically rather than matched explicitly.

Back to Blog

Related posts

Read more »