I Analysed 200 PRs in Shadcn-UI/UI To Find Duplicates: It Went Surprisingly Well.

Published: (April 19, 2026 at 05:41 PM EDT)
6 min read
Source: Dev.to

Source: Dev.to

PR Redundancy Audit – shadcn‑ui/ui

Inspired by a tweet from Pete about the flood of PRs on high‑traffic repos like OpenClaw. AI agents are great for coding, but they’re also generating duplicate logic and PRs that don’t align with a maintainer’s long‑term vision. This project audits PRs and tags them accordingly.

Goal

Identify when two (or more) contributors solve the same functional problem in completely different ways, often across disjoint files.

The system isn’t looking for copy‑pasted lines; it evaluates the architectural goal. When a match is found, it classifies the PR into one of three buckets:

BucketDescription
SHADOWAn exact duplicate fix for the same regression.
SUPERSETA broader architectural fix that subsumes a smaller, specific one.
COMPETINGTwo different paths taken to solve the same functional outcome.

All of the examples below aim to fix the broken /blocks page link.

Sample PRs (same functional failure, different files)

PR #TitleStrategyFile Modified
#10156fix: update broken link on /blocks pageSimple URL replacementapps/www/config/docs.ts
#10088fix(docs): absolute path for blocks linkPath normalizationapps/www/lib/utils.ts
#10096chore: rename internal block referencesRefactoring the reference keyapps/www/registry/registry.json

Even though the changes touch completely disjoint files (Config vs. Utils vs. Registry), the system identified that all three target the same functional failure – Goal Duplication.
PR #10088 solved the root cause (renaming the file), rendering the documentation fixes in #10156 and #10096 redundant before they were merged.

Audit Results

  • 200 recent PRs were scanned.
  • 69 valid redundancies were flagged.
  • Below are some of the most interesting matches.
PR IdentityPrimary MatchCategorisationWhy It Matters
#10404 – ThemeHotkey guard#10401SHADOWIdentical null‑check for event.key crash in Hotkeys.
#9895 – Docs Copy Button#9876SHADOWIdentical split of bash cmd/text to fix the copy button.
#10421 – DataTable A11y#10402SHADOWConcurrent addition of aria-labels to Data Tables.
#10403 – Drawer asChild fix#10139SHADOWAdding asChild to Drawer docs to fix broken nesting.
#10424 – Monorepo CLI fix#10258SUPERSETA broader “fix‑at‑once” strategy for the monorepo CLI.
#10393 – Geist Font Mismatch#10273SUPERSETMore robust font mapping than #10393.
#10244 – Calendar Responsive#10235COMPETINGDifferent CSS strategies for Calendar width responsiveness.
#10386 – ThemeHotkey bug#10404SHADOWIdentical logic‑level fix for an undefined key crash during autofill.
#10383 – FieldSeparator fix#10201SHADOWBoth PRs modify the same property to fix separator inheritance.
#10158 – iOS Date Input#10133COMPETINGGlobal CSS vs. component‑level fix for the same iOS rendering bug.

Implementation Overview

1. Back‑fill script (historical audit)

  1. Paginate PRs with Octokit, targeting critical branches (main, master).
  2. Compress & filter massive diffs to stay within free‑tier limits:
    • Strip known large files (SVGs, lockfiles, docs).
    • Remove comments & unchanged imports.
    • If still > 1500 chars, keep only the modified hunks (+/- lines).
  3. Vectorise each cleaned PR using Gemini embedding models and store in Upstash Vector.

2. Real‑time bot (live triage)

  1. When a new PR arrives, query the vector store for the 8 most similar candidates.
  2. Pass those candidates to an LLM reasoning loop to determine intent and assign a bucket (SHADOW / SUPERSET / COMPETING).

3. Handling Rate Limits

  • Router automatically pivots between providers (Gemini, Llama, OpenRouter, etc.).
  • 3‑retry logic with exponential back‑off for 503/429 errors.

Lessons Learned

IssueObservationMitigation
Vector GapPRs solving the same problem in very different ways sometimes never surface in the vector search, so they never reach the LLM.Added a fallback “semantic‑keyword” index (e.g., extracting domain‑specific terms) to broaden recall.
Structural BiasEarly models flagged unrelated JSON additions as duplicates because their shape looked similar.Prioritised literal values (IDs, URLs) over pure syntax when building embeddings.
Model QuotaExhausted higher‑tier AI quota and fell back to a smaller 8B model, which introduced more bias.Implemented a budget‑aware scheduler that prefers cheaper providers only when similarity scores are already high.
False PositivesVector similarity alone produced many spurious matches.The LLM reasoning stage now performs a goal‑alignment check before final categorisation.

Takeaways

  • Historical clustering of redundant PRs is an excellent stress test for any detection engine.
  • A two‑stage pipeline (vector similarity → LLM reasoning) balances speed and accuracy.
  • Rate‑limit‑aware design (budget‑free tier) is essential for open‑source tooling.
  • Even with sophisticated tooling, human review remains the final arbiter—especially for edge‑case “COMPETING” fixes.

If you’d like to see the code or run the audit on another repo, feel free to open an issue or drop a PR!

Problem

It started flagging totally new registry entries as duplicates just because they looked similar at a JSON‑structure level. The system was effectively ignoring the actual URL values because it wasn’t smart enough to weigh content over structure.

Weak on Wide Sweeps

When looking across large PRs, the system sometimes saw relationships that weren’t really there. If two big PRs happened to touch the same package, it could hallucinate a connection between them, even if, logically, they had nothing to do with each other.

Solution

The backfill engine is now the analytical core for what’s next: a live GitHub Bot. By using the historical memory we’ve built, the bot can analyze a new PR the second it’s opened and alert maintainers if a redundant fix already exists.

Future Work

I’m also exploring a Maintainer Dashboard to visualize these semantic clusters, giving project maintainers a high‑level view of where their contributors are accidentally overlapping.

Call to Action

If you’re a maintainer interested in trying it out on your repo or a developer who wants to contribute, hit me up—I’d love to chat.

0 views
Back to Blog

Related posts

Read more »

Sudo for Windows

Welcome to the repository for Sudo for Windowshttps://aka.ms/sudo 🥪. Sudo for Windows allows users to run elevated commands directly from unelevated terminal w...