Code Smell 317 - Email Handling Vulnerabilities

Published: 3 weeks ago (December 23, 2025 at 06:00 AM EST)

5 min read

Source: Dev.to

TL;DR

You normalize email for lookup but trust UI data for delivery, breaking identity ownership.

UI trust
Identity drift
Unicode confusion
String identity
Boundary breach
Collation confusion
Security bypass
Account takeover
Email spoofing
Server owns identity
Never trust UI input
Use strict collation
Use canonical emails
Normalize once
Persist then act
Implement Multi‑Factor Authentication

Refactorings ⚙️

Refactoring	Author	Date	Tags
Refactoring 019 – Reify Email Addresses	Maxi Contieri	Dec 5 ‘24	`#javascript` `#refactoring` `#designpatterns` `#beginners`
Refactoring 016 – Build With The Essence	Maxi Contieri	Sep 16 ‘24	`#webdev` `#beginners` `#programming` `#tutorial`
Refactoring 034 – Reify Parameters	Maxi Contieri	Oct 7	`#webdev` `#programming` `#javascript` `#beginners`

The Vulnerability

When you handle user input containing Unicode characters, system components interpret them in many different ways.

Some database engines with certain collations (e.g., utf8mb4_unicode_ci) treat Unicode characters with diacritics as equal to their ASCII counterparts.
- Example: 'à' equals 'a'.
Email servers, programming languages, and other systems distinguish between these characters.

This inconsistency creates a dangerous security vulnerability.

Attack Flow

An attacker registers an email address like attacker@gmàil.com (Unicode à).
The attacker requests a password reset for the victim’s legitimate account victim@gmail.com (ASCII a) and fills the email field with victim@gmàil.com.
The database collation treats both addresses as equal, so the query matches the victim’s row.
The application mistakenly uses the untrusted UI input to send the reset email, delivering it to the attacker’s Unicode address.

Result: The attacker gains full control of the victim’s account.

You violate the fundamental security principle: never trust data from the UI. Always use the canonical values stored in your database for security‑critical operations.

Vulnerable Code (Python)

def reset_password(email_from_ui):
    # email_from_ui = "victim@gmàil.com"   # attacker’s Unicode address from UI

    # Database uses utf8mb4_unicode_ci collation → 'à' == 'a'
    cursor.execute(
        "SELECT * FROM users WHERE email = %s",
        (email_from_ui,)
    )
    user = cursor.fetchone()

    if user:
        # CRITICAL MISTAKE: trusting UI data
        # Sends email to the attacker’s Unicode address
        send_reset_email(email_from_ui)   # ❌ should use user['email']
        return True
    return False

Attack Scenario

Stored in DB	Attacker controls	UI input	Collation match?	Email sent to
`victim@gmail.com` (ASCII)	`attacker@gmàil.com` (Unicode)	`victim@gmàil.com`	Yes	`victim@gmàil.com` (attacker)

Defensive Refactor (Python)

import unicodedata

def normalize_email(email: str) -> str:
    """
    Convert to NFKC normalized form, ensure ASCII only,
    and return a lower‑cased string.
    """
    normalized = unicodedata.normalize('NFKC', email)

    # Reject non‑ASCII characters
    try:
        normalized.encode('ascii')
    except UnicodeEncodeError as exc:
        raise ValueError("Email contains non‑ASCII characters.") from exc

    return normalized.lower()


def reset_password(email_from_ui: str) -> bool:
    # DEFENSE 1: Normalize and validate input
    try:
        normalized_email = normalize_email(email_from_ui)
    except ValueError:
        # Reject non‑ASCII emails immediately
        return False

    cursor.execute(
        "SELECT * FROM users WHERE email = %s",
        (normalized_email,)
    )
    user = cursor.fetchone()

    if user:
        # DEFENSE 2: NEVER trust UI data
        # Always use the canonical email from the database
        database_email = user['email']
        send_reset_email(database_email)   # ✅ correct
        return True
    return False

Result:

The attacker’s victim@gmàil.com is rejected because it contains non‑ASCII characters.
Even if it passed, the reset email is sent to user['email'] (the stored, trusted value).

Detection Checklist

Run static analysis to find patterns where UI input is used directly in external communications (e.g., send_email(user_input)).
Use Unicode fuzzers to test input handling.
Review all authentication‑related code for raw UI data usage.
Verify that email validation enforces ASCII‑only (or proper IDN handling) before any lookup.
Ensure database collation settings are strict (utf8mb4_bin or similar) and that Unicode normalization is applied consistently.
Flag any code that uses the original user input after a successful DB lookup – this is the core vulnerability.

Static analysis tools can flag:

UI input used without normalization.
Bypassing DB values in favor of user‑provided strings.
Mismatched collation settings across components.

Advanced Guidance

Bijection Requirement: There must be a one‑to‑one mapping between real‑world email addresses and their representation in your system.
Normalization Discipline: Apply Unicode normalization once (preferably at input validation) and store the canonical form.
Collation Consistency: Use a binary collation (utf8mb4_bin) for email columns to avoid accidental equivalence of distinct Unicode strings.
Multi‑Factor Authentication (MFA): Complement email‑based flows with MFA to mitigate the impact of any residual email delivery issues.

By enforcing strict collation, canonical storage, and never trusting UI data for security‑critical actions, you eliminate the described attack vector.

Unicode Normalization Smell

AI tools sometimes generate this smell because they are pre‑trained with poor code examples and focus on basic logic without considering encoding edge cases.

How AI can fix it – give clear prompts about normalization, security vulnerabilities, and stored‑data usage.

Remember: AI assistants make lots of mistakes.

Suggested Prompt

Model email as a server‑owned value object.
Normalize once.
After database lookup, discard UI input for security actions.

Prompting Styles

Without Proper Instructions	With Specific Instructions
ChatGPT	ChatGPT
Claude	Claude
Perplexity	Perplexity
Copilot	Copilot
You	You
Gemini	Gemini
DeepSeek	DeepSeek
Meta AI	Meta AI
Grok	Grok
Qwen	Qwen

Security Recommendations

Never use untrusted UI data for security‑critical operations (e.g., sending password‑reset emails).
Always normalize all user input to a canonical form and validate it strictly.
Use the canonical values from your database, not the user‑provided input, when performing authentication or sending security‑related communications.
The safest approach is to restrict email addresses to ASCII‑only characters and treat the database as the single source of truth.

Code Smells

Code Smell 189 – Not Sanitized Input

Maxi Contieri ・ Dec 28 ‘22

#webdev #beginners #programming #security

Code Smell 121 – String Validations

Maxi Contieri ・ Mar 13 ‘22

#webdev #programming #security

Disclaimer 📘

Code Smells are my opinion.

“Never trust input you do not control.” – Bruce Schneier

Photo by Aurèle Castellane on Unsplash

Software Engineering Great Quotes – Maxi Contieri ・ Dec 28 ‘20
```
#codenewbie #programming #quotes #software
```
How to Find the Stinky Parts of Your Code – Maxi Contieri ・ May 21 ‘21
```
#codenewbie #tutorial #codequality #beginners
```

This article is part of the CodeSmell Series.

Code Smell 317 - Email Handling Vulnerabilities

TL;DR

Refactorings ⚙️

The Vulnerability

Attack Flow

Vulnerable Code (Python)

Attack Scenario

Defensive Refactor (Python)

Detection Checklist

Advanced Guidance

Unicode Normalization Smell

Suggested Prompt

Prompting Styles

Security Recommendations

Code Smells

Code Smell 189 – Not Sanitized Input

Code Smell 121 – String Validations

Disclaimer 📘

Related posts

Fetching API Data with TypeScript: Using Type Assertions

Building Drag-and-Drop Tree Views with he-tree-react in React

The Trade-off: Clean Testing vs. Code Brevity in Modern JS

The Devil’s Clean Code: Lessons from Migrating a 20-Year-Old Legacy Project

TL;DR

Refactorings ⚙️

The Vulnerability

Attack Flow

Vulnerable Code (Python)

Attack Scenario

Defensive Refactor (Python)

Detection Checklist

Advanced Guidance

Unicode Normalization Smell

Suggested Prompt

Prompting Styles

Security Recommendations

Code Smells

Code Smell 189 – Not Sanitized Input

Code Smell 121 – String Validations

Disclaimer 📘

Related Articles

Related posts

Fetching API Data with TypeScript: Using Type Assertions

Building Drag-and-Drop Tree Views with he-tree-react in React

The Trade-off: Clean Testing vs. Code Brevity in Modern JS

The Devil’s Clean Code: Lessons from Migrating a 20-Year-Old Legacy Project