Code Smell 317 - Email Handling Vulnerabilities
Source: Dev.to
TL;DR
You normalize email for lookup but trust UI data for delivery, breaking identity ownership.
- UI trust
- Identity drift
- Unicode confusion
- String identity
- Boundary breach
- Collation confusion
- Security bypass
- Account takeover
- Email spoofing
- Server owns identity
- Never trust UI input
- Use strict collation
- Use canonical emails
- Normalize once
- Persist then act
- Implement Multi‑Factor Authentication
Refactorings ⚙️
| Refactoring | Author | Date | Tags |
|---|---|---|---|
| Refactoring 019 – Reify Email Addresses | Maxi Contieri | Dec 5 ‘24 | #javascript #refactoring #designpatterns #beginners |
| Refactoring 016 – Build With The Essence | Maxi Contieri | Sep 16 ‘24 | #webdev #beginners #programming #tutorial |
| Refactoring 034 – Reify Parameters | Maxi Contieri | Oct 7 | #webdev #programming #javascript #beginners |
The Vulnerability
When you handle user input containing Unicode characters, system components interpret them in many different ways.
- Some database engines with certain collations (e.g.,
utf8mb4_unicode_ci) treat Unicode characters with diacritics as equal to their ASCII counterparts.- Example:
'à'equals'a'.
- Example:
- Email servers, programming languages, and other systems distinguish between these characters.
This inconsistency creates a dangerous security vulnerability.
Attack Flow
- An attacker registers an email address like
attacker@gmàil.com(Unicode à). - The attacker requests a password reset for the victim’s legitimate account
victim@gmail.com(ASCII a) and fills the email field withvictim@gmàil.com. - The database collation treats both addresses as equal, so the query matches the victim’s row.
- The application mistakenly uses the untrusted UI input to send the reset email, delivering it to the attacker’s Unicode address.
Result: The attacker gains full control of the victim’s account.
You violate the fundamental security principle: never trust data from the UI. Always use the canonical values stored in your database for security‑critical operations.
Vulnerable Code (Python)
def reset_password(email_from_ui):
# email_from_ui = "victim@gmàil.com" # attacker’s Unicode address from UI
# Database uses utf8mb4_unicode_ci collation → 'à' == 'a'
cursor.execute(
"SELECT * FROM users WHERE email = %s",
(email_from_ui,)
)
user = cursor.fetchone()
if user:
# CRITICAL MISTAKE: trusting UI data
# Sends email to the attacker’s Unicode address
send_reset_email(email_from_ui) # ❌ should use user['email']
return True
return False
Attack Scenario
| Stored in DB | Attacker controls | UI input | Collation match? | Email sent to |
|---|---|---|---|---|
victim@gmail.com (ASCII) | attacker@gmàil.com (Unicode) | victim@gmàil.com | Yes | victim@gmàil.com (attacker) |
Defensive Refactor (Python)
import unicodedata
def normalize_email(email: str) -> str:
"""
Convert to NFKC normalized form, ensure ASCII only,
and return a lower‑cased string.
"""
normalized = unicodedata.normalize('NFKC', email)
# Reject non‑ASCII characters
try:
normalized.encode('ascii')
except UnicodeEncodeError as exc:
raise ValueError("Email contains non‑ASCII characters.") from exc
return normalized.lower()
def reset_password(email_from_ui: str) -> bool:
# DEFENSE 1: Normalize and validate input
try:
normalized_email = normalize_email(email_from_ui)
except ValueError:
# Reject non‑ASCII emails immediately
return False
cursor.execute(
"SELECT * FROM users WHERE email = %s",
(normalized_email,)
)
user = cursor.fetchone()
if user:
# DEFENSE 2: NEVER trust UI data
# Always use the canonical email from the database
database_email = user['email']
send_reset_email(database_email) # ✅ correct
return True
return False
Result:
- The attacker’s
victim@gmàil.comis rejected because it contains non‑ASCII characters. - Even if it passed, the reset email is sent to
user['email'](the stored, trusted value).
Detection Checklist
- Run static analysis to find patterns where UI input is used directly in external communications (e.g.,
send_email(user_input)). - Use Unicode fuzzers to test input handling.
- Review all authentication‑related code for raw UI data usage.
- Verify that email validation enforces ASCII‑only (or proper IDN handling) before any lookup.
- Ensure database collation settings are strict (
utf8mb4_binor similar) and that Unicode normalization is applied consistently. - Flag any code that uses the original user input after a successful DB lookup – this is the core vulnerability.
Static analysis tools can flag:
- UI input used without normalization.
- Bypassing DB values in favor of user‑provided strings.
- Mismatched collation settings across components.
Advanced Guidance
- Bijection Requirement: There must be a one‑to‑one mapping between real‑world email addresses and their representation in your system.
- Normalization Discipline: Apply Unicode normalization once (preferably at input validation) and store the canonical form.
- Collation Consistency: Use a binary collation (
utf8mb4_bin) for email columns to avoid accidental equivalence of distinct Unicode strings. - Multi‑Factor Authentication (MFA): Complement email‑based flows with MFA to mitigate the impact of any residual email delivery issues.
By enforcing strict collation, canonical storage, and never trusting UI data for security‑critical actions, you eliminate the described attack vector.
Unicode Normalization Smell
AI tools sometimes generate this smell because they are pre‑trained with poor code examples and focus on basic logic without considering encoding edge cases.
How AI can fix it – give clear prompts about normalization, security vulnerabilities, and stored‑data usage.
Remember: AI assistants make lots of mistakes.
Suggested Prompt
Model email as a server‑owned value object.
Normalize once.
After database lookup, discard UI input for security actions.
Prompting Styles
| Without Proper Instructions | With Specific Instructions |
|---|---|
| ChatGPT | ChatGPT |
| Claude | Claude |
| Perplexity | Perplexity |
| Copilot | Copilot |
| You | You |
| Gemini | Gemini |
| DeepSeek | DeepSeek |
| Meta AI | Meta AI |
| Grok | Grok |
| Qwen | Qwen |
Security Recommendations
- Never use untrusted UI data for security‑critical operations (e.g., sending password‑reset emails).
- Always normalize all user input to a canonical form and validate it strictly.
- Use the canonical values from your database, not the user‑provided input, when performing authentication or sending security‑related communications.
- The safest approach is to restrict email addresses to ASCII‑only characters and treat the database as the single source of truth.
Code Smells
Code Smell 189 – Not Sanitized Input
Maxi Contieri ・ Dec 28 ‘22
#webdev #beginners #programming #security
Code Smell 121 – String Validations
Maxi Contieri ・ Mar 13 ‘22
#webdev #programming #security
Disclaimer 📘
Code Smells are my opinion.
“Never trust input you do not control.” – Bruce Schneier
Photo by Aurèle Castellane on Unsplash
Related Articles
-
Software Engineering Great Quotes – Maxi Contieri ・ Dec 28 ‘20
#codenewbie #programming #quotes #software -
How to Find the Stinky Parts of Your Code – Maxi Contieri ・ May 21 ‘21
#codenewbie #tutorial #codequality #beginners
This article is part of the CodeSmell Series.