Python Regex Explained Simply — Extract Anything From Text

Published: (June 12, 2026 at 01:59 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Regex sounds intimidating. It is not. Once you understand the 5 core concepts, you can extract any pattern from any text in seconds. Here is everything you need to know. Regex is a pattern language. You describe what you are looking for using special characters and Python finds it for you — in any block of text, any size. Real example: your client sends you a document with 500 customer records mixed with random text. They need all email addresses extracted into Excel. Without regex this takes hours. With regex it takes 3 lines. import re

text = “Contact john@gmail.com or sales@company.com for details” emails = re.findall(r’[\w.-]+@[\w.-]+.\w+’, text) print(emails)

[‘john@gmail.com’, ‘sales@company.com’]

re.findall(r’\d’, ‘abc123def456’)

[‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’]

re.findall(r’\w+’, ‘hello world_123’)

[‘hello’, ‘world_123’]

re.findall(r’\d+’, ‘price is 45000 and tax is 8100’)

[‘45000’, ‘8100’]

re.findall(r’[aeiou]’, ‘hello world’)

[‘e’, ‘o’, ‘o’]

re.findall(r’c.t’, ‘cat cut cot bat’)

[‘cat’, ‘cut’, ‘cot’]

Returns a list of everything that matches the pattern. text = “Prices: ₹45,000 and ₹12,500 and ₹8,750” prices = re.findall(r’[\d,]+’, text) print(prices)

[‘45,000’, ‘12,500’, ‘8,750’]

Replaces every match with something else. messy = “phone: 98-765-43210” clean = re.sub(r’\D’, ”, messy) # remove all non-digits print(clean)

‘9876543210’

Returns just the first match with its position. text = “Order #A12345 placed successfully” match = re.search(r’#(\w+)’, text) if match: print(match.group(1)) # A12345

Client problem: they have a spreadsheet with phone numbers in 6 different formats. They need them all standardised to 10 digits. import pandas as pd import re

df = pd.DataFrame({ ‘Phone’: [‘9876543210’, ‘+91-9876543210’, ‘(080) 4567-8901’, ‘91 98765 43210’] })

def clean_phone(phone): digits = re.sub(r’\D’, ”, phone) if len(digits) == 10: return digits elif len(digits) == 12 and digits.startswith(‘91’): return digits[2:] return None

df[‘Clean’] = df[‘Phone’].apply(clean_phone) print(df)

Output: Regex is a pattern language — you describe what you are looking for and Python finds every instance of it in any text, any size. Learn these 5 patterns and 3 functions and you can handle 90% of real data extraction gigs immediately. Written by Raaga Priya Madhan — CSE student, Bangalore. I build Python automation and data extraction scripts. See my work on GitHub and connect on LinkedIn

0 views
Back to Blog

Related posts

Read more »

Introduction to Git

Welcome to Git Mastery, a series where we'll learn Git from the ground up, starting with the absolute basics and gradually moving toward advanced workflows, Git...