Understanding the RegexEntityExtractor in RASA
Source: Dev.to
What is RegexEntityExtractor?
If the text matches a predefined pattern, it is extracted as an entity.
This makes the extractor:
- Deterministic
- Fast
- Extremely precise (when patterns are well‑defined)
Why do we need it?
Use it for entities that:
- Follow a fixed format
- Are numerical or otherwise structured
- Do not benefit from ML generalization
Typical examples:
- Phone numbers
- Email addresses
- Order IDs
- Dates
- ZIP codes
Training a machine‑learning model to extract such patterns is often overkill.
YAML Configuration Example
version: "3.1"
nlu:
- regex: phone_number
examples: |
- abc@gmail.com
- xyz@gmail.com
Pipeline Configuration
pipeline:
- name: WhitespaceTokenizer
- name: RegexEntityExtractor
Internal Working
- Takes the raw user message.
- Iterates over each regex pattern defined in the YAML file.
- Applies the pattern to the text.
- If a match is found:
- Extracts the matched substring.
- Assigns it as an entity.
- Stores start and end character indices.
Example
User message:
My phone number is 9876543210
Extracted entity:
{
"entity": "phone_number",
"value": "9876543210",
"start": 19,
"end": 29
}
Combining with Entity Synonym Mapper
RegexEntityExtractorextracts the raw entity.EntitySynonymMappernormalizes it to a canonical value.
This combination provides:
- High precision
- Consistency
- Clean downstream data
When Should RegexEntityExtractor Be Used?
- When the entity format is predictable.
- When precision matters more than recall.
- When you want to reduce ML complexity.
- When deterministic behavior is required.
Next Topic
We will explore CRFEntityExtractor, where entities are learned statistically rather than matched explicitly.