Code Smell 317 - Email Handling Vulnerabilities

Published: (December 23, 2025 at 06:00 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

TL;DR

You normalize email for lookup but trust UI data for delivery, breaking identity ownership.

  • UI trust
  • Identity drift
  • Unicode confusion
  • String identity
  • Boundary breach
  • Collation confusion
  • Security bypass
  • Account takeover
  • Email spoofing
  • Server owns identity
  • Never trust UI input
  • Use strict collation
  • Use canonical emails
  • Normalize once
  • Persist then act
  • Implement Multi‑Factor Authentication

Refactorings ⚙️

RefactoringAuthorDateTags
Refactoring 019 – Reify Email AddressesMaxi ContieriDec 5 ‘24#javascript #refactoring #designpatterns #beginners
Refactoring 016 – Build With The EssenceMaxi ContieriSep 16 ‘24#webdev #beginners #programming #tutorial
Refactoring 034 – Reify ParametersMaxi ContieriOct 7#webdev #programming #javascript #beginners

The Vulnerability

When you handle user input containing Unicode characters, system components interpret them in many different ways.

  • Some database engines with certain collations (e.g., utf8mb4_unicode_ci) treat Unicode characters with diacritics as equal to their ASCII counterparts.
    • Example: 'à' equals 'a'.
  • Email servers, programming languages, and other systems distinguish between these characters.

This inconsistency creates a dangerous security vulnerability.

Attack Flow

  1. An attacker registers an email address like attacker@gmàil.com (Unicode à).
  2. The attacker requests a password reset for the victim’s legitimate account victim@gmail.com (ASCII a) and fills the email field with victim@gmàil.com.
  3. The database collation treats both addresses as equal, so the query matches the victim’s row.
  4. The application mistakenly uses the untrusted UI input to send the reset email, delivering it to the attacker’s Unicode address.

Result: The attacker gains full control of the victim’s account.

You violate the fundamental security principle: never trust data from the UI. Always use the canonical values stored in your database for security‑critical operations.

Vulnerable Code (Python)

def reset_password(email_from_ui):
    # email_from_ui = "victim@gmàil.com"   # attacker’s Unicode address from UI

    # Database uses utf8mb4_unicode_ci collation → 'à' == 'a'
    cursor.execute(
        "SELECT * FROM users WHERE email = %s",
        (email_from_ui,)
    )
    user = cursor.fetchone()

    if user:
        # CRITICAL MISTAKE: trusting UI data
        # Sends email to the attacker’s Unicode address
        send_reset_email(email_from_ui)   # ❌ should use user['email']
        return True
    return False

Attack Scenario

Stored in DBAttacker controlsUI inputCollation match?Email sent to
victim@gmail.com (ASCII)attacker@gmàil.com (Unicode)victim@gmàil.comYesvictim@gmàil.com (attacker)

Defensive Refactor (Python)

import unicodedata

def normalize_email(email: str) -> str:
    """
    Convert to NFKC normalized form, ensure ASCII only,
    and return a lower‑cased string.
    """
    normalized = unicodedata.normalize('NFKC', email)

    # Reject non‑ASCII characters
    try:
        normalized.encode('ascii')
    except UnicodeEncodeError as exc:
        raise ValueError("Email contains non‑ASCII characters.") from exc

    return normalized.lower()


def reset_password(email_from_ui: str) -> bool:
    # DEFENSE 1: Normalize and validate input
    try:
        normalized_email = normalize_email(email_from_ui)
    except ValueError:
        # Reject non‑ASCII emails immediately
        return False

    cursor.execute(
        "SELECT * FROM users WHERE email = %s",
        (normalized_email,)
    )
    user = cursor.fetchone()

    if user:
        # DEFENSE 2: NEVER trust UI data
        # Always use the canonical email from the database
        database_email = user['email']
        send_reset_email(database_email)   # ✅ correct
        return True
    return False

Result:

  • The attacker’s victim@gmàil.com is rejected because it contains non‑ASCII characters.
  • Even if it passed, the reset email is sent to user['email'] (the stored, trusted value).

Detection Checklist

  • Run static analysis to find patterns where UI input is used directly in external communications (e.g., send_email(user_input)).
  • Use Unicode fuzzers to test input handling.
  • Review all authentication‑related code for raw UI data usage.
  • Verify that email validation enforces ASCII‑only (or proper IDN handling) before any lookup.
  • Ensure database collation settings are strict (utf8mb4_bin or similar) and that Unicode normalization is applied consistently.
  • Flag any code that uses the original user input after a successful DB lookup – this is the core vulnerability.

Static analysis tools can flag:

  • UI input used without normalization.
  • Bypassing DB values in favor of user‑provided strings.
  • Mismatched collation settings across components.

Advanced Guidance

  • Bijection Requirement: There must be a one‑to‑one mapping between real‑world email addresses and their representation in your system.
  • Normalization Discipline: Apply Unicode normalization once (preferably at input validation) and store the canonical form.
  • Collation Consistency: Use a binary collation (utf8mb4_bin) for email columns to avoid accidental equivalence of distinct Unicode strings.
  • Multi‑Factor Authentication (MFA): Complement email‑based flows with MFA to mitigate the impact of any residual email delivery issues.

By enforcing strict collation, canonical storage, and never trusting UI data for security‑critical actions, you eliminate the described attack vector.

Unicode Normalization Smell

AI tools sometimes generate this smell because they are pre‑trained with poor code examples and focus on basic logic without considering encoding edge cases.

How AI can fix it – give clear prompts about normalization, security vulnerabilities, and stored‑data usage.

Remember: AI assistants make lots of mistakes.

Suggested Prompt

Model email as a server‑owned value object.
Normalize once.
After database lookup, discard UI input for security actions.

Prompting Styles

Without Proper InstructionsWith Specific Instructions
ChatGPTChatGPT
ClaudeClaude
PerplexityPerplexity
CopilotCopilot
YouYou
GeminiGemini
DeepSeekDeepSeek
Meta AIMeta AI
GrokGrok
QwenQwen

Security Recommendations

  • Never use untrusted UI data for security‑critical operations (e.g., sending password‑reset emails).
  • Always normalize all user input to a canonical form and validate it strictly.
  • Use the canonical values from your database, not the user‑provided input, when performing authentication or sending security‑related communications.
  • The safest approach is to restrict email addresses to ASCII‑only characters and treat the database as the single source of truth.

Code Smells

Code Smell 189 – Not Sanitized Input

Maxi Contieri ・ Dec 28 ‘22

#webdev #beginners #programming #security

Code Smell 121 – String Validations

Maxi Contieri ・ Mar 13 ‘22

#webdev #programming #security

Disclaimer 📘

Code Smells are my opinion.

“Never trust input you do not control.” – Bruce Schneier

Photo by Aurèle Castellane on Unsplash

  • Software Engineering Great QuotesMaxi Contieri ・ Dec 28 ‘20

    #codenewbie #programming #quotes #software
  • How to Find the Stinky Parts of Your CodeMaxi Contieri ・ May 21 ‘21

    #codenewbie #tutorial #codequality #beginners

This article is part of the CodeSmell Series.

Back to Blog

Related posts

Read more »