Embedding Accessibility into AI based software development
Here’s a cleaned‑up version of the markdown snippet, with a clear, properly formatted citation link:
**Source:** [Embedding Accessibility into AI‑based Software Development](https://dev.to/mfairchild365/embedding-accessibility-into-ai-based-software-development-282k)If you prefer to keep it as a blockquote, you can use:
Source: Embedding Accessibility into AI‑based Software Development
Both versions preserve the original information while adhering to standard markdown conventions.
CSUN‑AT 2026 – Embedding Accessibility into AI‑Powered Software Development
Speaker: [Your Name]
Key Takeaways
- AI is transforming the entire development lifecycle – it is now embedded in design tools, developer workflows, content creation, and user experiences. This speed and scale boost productivity and amplify the risk of scaling accessibility issues.
- Accessibility must be intentional – if we don’t embed it into AI‑powered workflows, we will scale accessibility barriers as fast as we scale productivity.
- LLMs generate poorly accessible code by default because they are trained on web code that already contains many accessibility problems.
- Explicit accessibility instructions dramatically improve outcomes – structured guidance can push some models from near‑zero to > 90 % pass rates.
- Teams should embed accessibility into AI tooling and pipelines using:
- Custom instructions / prompts
- CI/CD accessibility checks
- Ongoing manual testing
Evaluation Tool (Microsoft)
I built an open‑source benchmark to measure how well LLMs produce accessible code.
| Item | Details |
|---|---|
| Repository | (link to repository) |
| What it does | 1. Runs a suite of prompts that generate pages and common components. 2. Evaluates the resulting code with the axe‑core automated scanner via Playwright. 3. Adds a set of custom tests (beyond axe‑core) for keyboard behavior, required semantics, etc. |
Note: Axe‑core is a generic tool; it cannot test every WCAG success criterion (e.g., keyboard interactions, nuanced semantics). Therefore, the suite does not fully evaluate WCAG compliance, and manual testing remains essential.
- The prompts do not contain any accessibility guidance – they serve as a baseline/control to see how LLMs perform without explicit instructions.
Results (most recent report)
| Model | Pass Rate |
|---|---|
| GPT 5.2 | 41 % |
| GPT‑4 (other variants) | – |
| Top 3 models | All GPT‑based |
| All other models* | 0 % – near 0 % |
*Includes Gemini 3 Pro, Grok 4 Fast Non‑Reasoning, Gemini 3 Flash Preview, DeepSeek V3.2, Claude Haiku 4.5, Claude Sonnet 4.5, Claude Opus 4.6, etc.
- Average score across all models: ~10 %
- Why so low? Roughly 95 % of websites have accessibility issues, so most training data is already inaccessible.
- Why is GPT better? Likely trained on a higher‑quality dataset that contains more accessible code.
Custom Instructions for Accessibility
Custom instructions are typically .md files that define common guidelines and rules, automatically influencing AI‑generated code. VS Code has excellent documentation on setting these up. When configured correctly, the agent applies the instructions to all prompts.
Benchmarked Instruction Sets (LLM‑Eval)
| Instruction Set | Description | Pass‑Rate Improvement |
|---|---|---|
| Minimal | “All output MUST be accessible.” | +18 percentage points |
| Basic | • “All output MUST be accessible.” • “Use semantic HTML first; only use ARIA when necessary, and ensure full keyboard support.” • “Conform to WCAG 2.2 Level AA.” | +37 percentage points |
| Detailed | Full‑on expert‑level guidance (see the Awesome Copilot project). | +48 percentage points |
Simply mentioning the word “accessibility” yields a huge impact.
Observations with Detailed Instructions
- Some models (e.g., GPT) exceed 90 % pass rates.
- Other models show only marginal gains.
- Certain models (e.g., Grok) remain at 0 %.
The detailed instructions are published in the Awesome Copilot repository – a great starting point.
Crafting Effective Custom Instructions
These tips are based on GitHub guidance and real‑world experience.
- Tailor to your team/project – Define specific workflows, tools, standards, design systems, and component libraries.
- Use precise normative language – Use terms such as
MUST,MUST NOT,SHOULD,SHOULD NOT. - Structure with lists – LLMs perform best with clear, hierarchical formatting.
- Ask an agent to optimize your instructions – A quick “refine this instruction set for clarity” can be helpful.
- Avoid pasting entire standards (e.g., WCAG, ARIA) – This often degrades output quality.
- Don’t rely on external links for critical resources – Agents typically won’t follow them.
AI’s Broad Impact on Software Development
| Change | Opportunity |
|---|---|
| AI‑accelerated product & UX research – “synthetic users” (AI bots) simulate feedback on ideas and designs. | Synthetic users can provide quick accessibility feedback, but they cannot replace the lived experiences of people with disabilities. |
| AI‑driven data analysis – identifies trends that drive new features or changes. | AI can help surface accessibility issues from customer feedback and telemetry, provided privacy is respected. |
| Designers using AI for rapid prototyping – moving from static mock‑ups to code‑ready prototypes; designers and developers often work in parallel. | Designers can contribute directly to accessible component libraries and code when AI tools are guided by proper accessibility instructions. |
Takeaway
Embedding accessibility into AI‑driven workflows is not optional – it’s essential to prevent the amplification of existing barriers while reaping AI’s productivity gains. By using custom instruction files, integrating CI/CD accessibility checks, and maintaining manual validation, teams can ensure that the speed AI brings does not come at the cost of inclusive, usable software.
Production Code
This has yet to become a reality.
Opportunity
- AI can be leveraged to help designers annotate for accessibility quickly and accurately, as well as review their designs and annotations.
- Designers can also use custom instructions for accessibility to improve their vibe‑coded prototypes.
Change
- Development is happening on a much larger scale than ever before, and testing is struggling to keep up.
Opportunity (Testing)
- Leverage AI to facilitate and assist in testing.
- Clear policy and quality gates are now more important than ever and need to be consistently enforced.
- Ensure that accessibility is baked into the CI/CD pipeline and that it blocks pull requests when requirements aren’t met.
- Manual testing by humans remains essential.