How anagram solvers actually work: algorithms behind the scenes

Published: (April 21, 2026 at 07:16 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

The Canonical Sorted‑Key Approach

The most efficient way to solve for exact anagrams is to normalize the dictionary. If two words are anagrams, they contain the exact same characters with the same frequencies. Therefore, if you sort the letters of any word alphabetically, all anagrams of that word will result in the same “key.”

Example: “listen” and “silent” both become eilnst when sorted.

The Implementation

We preprocess our dictionary into a hash map (or dictionary in Python) where the key is the sorted string and the value is a list of matching words.

from collections import defaultdict

def build_anagram_index(word_list):
    index = defaultdict(list)
    for word in word_list:
        # Canonicalize: sort the letters
        key = "".join(sorted(word.lower()))
        index[key].append(word)
    return index

# Lookup is O(1) after O(n log n) preprocessing
def find_anagrams(word, index):
    key = "".join(sorted(word.lower()))
    return index.get(key, [])

The lookup time is effectively O(L log L) where L is the length of the input word (due to the sorting step). Once sorted, the hash‑map lookup is O(1).

Try it live to see how this canonical mapping handles complex dictionary lookups in real‑time.

Moving Beyond Exact Matches: The Trie

The sorted‑key approach is perfect for finding full‑word anagrams, but what if you want to find words that can be formed from a subset of your letters? Or support “wildcard” tiles? This is where the Trie (Prefix Tree) shines. A Trie stores words as a tree structure where each node represents a character. Instead of sorting, you traverse the tree. If you have the letters “a, r, t,” you start at the root and move to the a branch, then r, then t. If you hit a node marked as a “word end,” you’ve found a valid word.

Why use a Trie?

Tries are excellent for partial matches and prefix searching. If you are building a game like Boggle or a crossword helper, you can prune the search space early. If the current path in your Trie doesn’t exist, you stop searching that branch immediately.

class TrieNode {
  constructor() {
    this.children = {};
    this.isEndOfWord = false;
  }
}

// Searching for words that can be formed by a set of letters
function findWords(node, letters, currentWord, results) {
  if (node.isEndOfWord) results.push(currentWord);

  for (let char in node.children) {
    if (letters[char] > 0) {
      letters[char]--;
      findWords(node.children[char], letters, currentWord + char, results);
      letters[char]++; // Backtrack
    }
  }
}

This method is significantly more flexible than the sorted‑key method, though it requires more memory to store the tree structure. For a visual look at how these letter‑based constraints work in practice, check out lettersintowords.com.

The Bit‑Vector Optimization

If you are working in a memory‑constrained environment or need to perform millions of subset checks per second, you can represent a word’s character count using a bit‑vector.

Since there are only 26 letters in the English alphabet, you can map each letter to a specific bit position. A simple bit‑mask only tells you if a letter is present. To handle anagrams, you need a frequency count. You can store the count of each letter in a fixed‑size array (or a 64‑bit integer if the counts are small).

Comparing if word A is a subset of word B becomes a simple vector comparison:

if (wordA_counts[i] <= wordB_counts[i]) // for all i

This is the “secret sauce” behind high‑performance engines that need to check if a rack of tiles can form a specific word without traversing a massive tree structure.

What I’d Try Next

If scaling to a massive dictionary (e.g., the Scrabble Tournament Word List), consider Bloom Filters. They allow you to check if a word might exist in the dictionary with very little memory overhead, acting as a fast‑fail layer before hitting the more expensive Trie or hash map.

Experiment with DAWGs (Directed Acyclic Word Graphs) as well. A DAWG is essentially a compressed Trie. Since many words share suffixes (like “‑ing” or “‑tion”), a DAWG merges these nodes, drastically reducing the memory footprint while maintaining the same lookup speed as a standard Trie.

Whether you choose the simplicity of the sorted‑key hash map or the power of a Trie, the key is understanding the trade‑off between your dictionary’s memory footprint and the speed of your search. Happy coding!

0 views
Back to Blog

Related posts

Read more »

Big O Notation explained

Introduction Big O notation describes how the running time of an algorithm grows as the size of its input increases. Understanding the most common complexities...