I Analysed 200 PRs in Shadcn-UI/UI To Find Duplicates: It Went Surprisingly Well.

Published: 3 weeks ago (April 19, 2026 at 05:41 PM EDT)

6 min read

Source: Dev.to

PR Redundancy Audit – shadcn‑ui/ui

Inspired by a tweet from Pete about the flood of PRs on high‑traffic repos like OpenClaw. AI agents are great for coding, but they’re also generating duplicate logic and PRs that don’t align with a maintainer’s long‑term vision. This project audits PRs and tags them accordingly.

Goal

Identify when two (or more) contributors solve the same functional problem in completely different ways, often across disjoint files.

The system isn’t looking for copy‑pasted lines; it evaluates the architectural goal. When a match is found, it classifies the PR into one of three buckets:

Bucket	Description
SHADOW	An exact duplicate fix for the same regression.
SUPERSET	A broader architectural fix that subsumes a smaller, specific one.
COMPETING	Two different paths taken to solve the same functional outcome.

All of the examples below aim to fix the broken /blocks page link.

Sample PRs (same functional failure, different files)

PR #	Title	Strategy	File Modified
#10156	`fix: update broken link on /blocks page`	Simple URL replacement	`apps/www/config/docs.ts`
#10088	`fix(docs): absolute path for blocks link`	Path normalization	`apps/www/lib/utils.ts`
#10096	`chore: rename internal block references`	Refactoring the reference key	`apps/www/registry/registry.json`

Even though the changes touch completely disjoint files (Config vs. Utils vs. Registry), the system identified that all three target the same functional failure – Goal Duplication.
PR #10088 solved the root cause (renaming the file), rendering the documentation fixes in #10156 and #10096 redundant before they were merged.

Audit Results

200 recent PRs were scanned.
69 valid redundancies were flagged.
Below are some of the most interesting matches.

PR Identity	Primary Match	Categorisation	Why It Matters
#10404 – ThemeHotkey guard	#10401	SHADOW	Identical null‑check for `event.key` crash in Hotkeys.
#9895 – Docs Copy Button	#9876	SHADOW	Identical split of bash cmd/text to fix the copy button.
#10421 – DataTable A11y	#10402	SHADOW	Concurrent addition of `aria-label`s to Data Tables.
#10403 – Drawer `asChild` fix	#10139	SHADOW	Adding `asChild` to Drawer docs to fix broken nesting.
#10424 – Monorepo CLI fix	#10258	SUPERSET	A broader “fix‑at‑once” strategy for the monorepo CLI.
#10393 – Geist Font Mismatch	#10273	SUPERSET	More robust font mapping than #10393.
#10244 – Calendar Responsive	#10235	COMPETING	Different CSS strategies for Calendar width responsiveness.
#10386 – ThemeHotkey bug	#10404	SHADOW	Identical logic‑level fix for an undefined key crash during autofill.
#10383 – FieldSeparator fix	#10201	SHADOW	Both PRs modify the same property to fix separator inheritance.
#10158 – iOS Date Input	#10133	COMPETING	Global CSS vs. component‑level fix for the same iOS rendering bug.

Implementation Overview

1. Back‑fill script (historical audit)

Paginate PRs with Octokit, targeting critical branches (main, master).
Compress & filter massive diffs to stay within free‑tier limits:
- Strip known large files (SVGs, lockfiles, docs).
- Remove comments & unchanged imports.
- If still > 1500 chars, keep only the modified hunks (+/- lines).
Vectorise each cleaned PR using Gemini embedding models and store in Upstash Vector.

2. Real‑time bot (live triage)

When a new PR arrives, query the vector store for the 8 most similar candidates.
Pass those candidates to an LLM reasoning loop to determine intent and assign a bucket (SHADOW / SUPERSET / COMPETING).

3. Handling Rate Limits

Router automatically pivots between providers (Gemini, Llama, OpenRouter, etc.).
3‑retry logic with exponential back‑off for 503/429 errors.

Lessons Learned

Issue	Observation	Mitigation
Vector Gap	PRs solving the same problem in very different ways sometimes never surface in the vector search, so they never reach the LLM.	Added a fallback “semantic‑keyword” index (e.g., extracting domain‑specific terms) to broaden recall.
Structural Bias	Early models flagged unrelated JSON additions as duplicates because their shape looked similar.	Prioritised literal values (IDs, URLs) over pure syntax when building embeddings.
Model Quota	Exhausted higher‑tier AI quota and fell back to a smaller 8B model, which introduced more bias.	Implemented a budget‑aware scheduler that prefers cheaper providers only when similarity scores are already high.
False Positives	Vector similarity alone produced many spurious matches.	The LLM reasoning stage now performs a goal‑alignment check before final categorisation.

Takeaways

Historical clustering of redundant PRs is an excellent stress test for any detection engine.
A two‑stage pipeline (vector similarity → LLM reasoning) balances speed and accuracy.
Rate‑limit‑aware design (budget‑free tier) is essential for open‑source tooling.
Even with sophisticated tooling, human review remains the final arbiter—especially for edge‑case “COMPETING” fixes.

If you’d like to see the code or run the audit on another repo, feel free to open an issue or drop a PR!

Problem

It started flagging totally new registry entries as duplicates just because they looked similar at a JSON‑structure level. The system was effectively ignoring the actual URL values because it wasn’t smart enough to weigh content over structure.

Weak on Wide Sweeps

When looking across large PRs, the system sometimes saw relationships that weren’t really there. If two big PRs happened to touch the same package, it could hallucinate a connection between them, even if, logically, they had nothing to do with each other.

Solution

The backfill engine is now the analytical core for what’s next: a live GitHub Bot. By using the historical memory we’ve built, the bot can analyze a new PR the second it’s opened and alert maintainers if a redundant fix already exists.

Future Work

I’m also exploring a Maintainer Dashboard to visualize these semantic clusters, giving project maintainers a high‑level view of where their contributors are accidentally overlapping.

Call to Action

If you’re a maintainer interested in trying it out on your repo or a developer who wants to contribute, hit me up—I’d love to chat.

I Analysed 200 PRs in Shadcn-UI/UI To Find Duplicates: It Went Surprisingly Well.

PR Redundancy Audit – shadcn‑ui/ui

Goal

Sample PRs (same functional failure, different files)

Audit Results

Implementation Overview

1. Back‑fill script (historical audit)

2. Real‑time bot (live triage)

3. Handling Rate Limits

Lessons Learned

Takeaways

Problem

Weak on Wide Sweeps

Solution

Future Work

Call to Action

Related posts

I built a 'smart Spotlight' for macOS that sees your screen and executes tasks, here a couple of takeaways

We Open Sourced the Djowda Platform — Here Are 6 Challenges for the Community

Why Pull Requests Go Stale — And Why It's a Visibility Problem, Not a People Problem

Privacy-first mind mapping app. Part 0: Motivations and Mind Maps

PR Redundancy Audit – shadcn‑ui/ui

Goal

Sample PRs (same functional failure, different files)

Audit Results

Implementation Overview

1. Back‑fill script (historical audit)

2. Real‑time bot (live triage)

3. Handling Rate Limits

Lessons Learned

Takeaways

Problem

Weak on Wide Sweeps

Solution

Future Work

Call to Action

Related posts

I built a 'smart Spotlight' for macOS that sees your screen and executes tasks, here a couple of takeaways

We Open Sourced the Djowda Platform — Here Are 6 Challenges for the Community

Why Pull Requests Go Stale — And Why It's a Visibility Problem, Not a People Problem

Privacy-first mind mapping app. Part 0: Motivations and Mind Maps

PR Redundancy Audit – shadcn‑ui/ui