I tried using AI to build an exam system. It worked… until it didn’t.

Published: 2 days ago (May 2, 2026 at 01:05 PM EDT)

6 min read

Source: Dev.to

The Journey from Structured‑Data Bugs to an AI‑Powered Exam Platform

I didn’t start with the idea of building an exam platform. It began with a different problem: we were using AI to generate structured data for APIs. At first the responses looked fine, but in production we started seeing subtle format errors—e.g., 120.5 instead of the required 120.50. The downstream system rejected the value because it expected an exact format.

These tiny issues took a lot of time to debug and kept re‑appearing.

Why This Matters for Exams

If AI can slip on simple numeric formatting, what happens when we ask it to generate exam questions or evaluate answers? In demos it looks impressive—instant question creation, automatic grading—but in real usage consistency becomes a problem:

Difficulty levels vary randomly.
Answers are not always structured the same way.
Evaluation can feel subjective.

That’s not reliable for students or schools.

The First Attempt: Prompt Engineering

I tried the usual fix: improve the prompts. I made them longer, added more rules, and was very specific. It helped a little, but it didn’t solve the core issue—edge cases still produced slightly off output.

The real problem wasn’t the prompt; it was trusting raw AI output without any control.

The Solution: A Small Java Wrapper

Instead of “fixing AI,” I built a lightweight Java application (distributed as a JAR). Users can:

Enter their details.
Choose a subject and topic.
Generate questions, run a timer, submit answers, and receive a report.

The novelty isn’t the UI; it’s what happens between the UI and the AI. Below is a walkthrough of the UI screens.

Home Page (Landing + Features)

The main entry point shows the overall idea clearly—a full‑stack AI‑powered exam platform where users can generate questions, register, and select topics.

Not just a basic form‑based app.
Positioned as a complete examination framework with AI question generation, evaluation, timer, and reporting already integrated.
The feature section below highlights the core capabilities in a structured way, emphasizing that the platform handles the full exam lifecycle, not just question generation.

Student Registration

Captures detailed student information—name, email, age, country, experience, interests, and education level.

This context allows the system to personalize question generation.
The design moves beyond generic questions toward adaptive exam generation.
Simple layout, but a strong idea: collect enough context for the AI to produce relevant, meaningful questions.

Topic Selection

Pre‑defined exam topics (Java, Spring, System Design, etc.) with clear details:

Difficulty level
Number of questions
Time duration

This screen adds structure:

Flexibility – multiple topics to choose from.
Control – fixed duration and difficulty levels.

By reducing randomness, the exam becomes predictable and fair.

Exam Start Screen

Instructions displayed before the exam begins, including rules such as:

Time limits.
No page refresh.
Answers cannot be changed once submitted.

Mathematics – Easy

Shows basic math questions (addition, multiplication). Although it looks like a normal quiz, the questions are generated on‑the‑fly by AI based on the selected subject (Mathematics) and difficulty (Easy).

When the user selects the topic, the system sends a prompt to the AI such as:

“Generate easy‑level math questions with two‑digit addition and single‑digit multiplication. Return each question in the format Q: and the answer in the format A: .”

The response is then parsed, displayed, and later evaluated automatically.

Key Takeaways

Control the AI, don’t trust it blindly.
Add a deterministic layer (prompt templates, strict output parsing, validation) to guarantee consistency.
Collect contextual data (student profile, topic settings) to make AI output more relevant and reproducible.
Wrap the AI in a thin application that enforces rules, timing, and reporting—turning a “cool demo” into a production‑ready exam platform.

If you’re interested in the source code or want to try the JAR yourself, feel free to reach out!

Overview

The system validates every AI‑generated exam question before it is presented to the student.
The validation layer checks:

Structure – each question must have exactly four options.
Formatting consistency – the same layout is used for all questions.
Correct answer extraction – the designated answer is identified and stored.

For math questions the system can also verify the answer deterministically (e.g., recompute 15 + 27 and confirm the result).

Science – Medium

Science question example

In this screen the questions are conceptual (chemical symbols, physics facts). Because the answers are not numeric, the system relies on AI knowledge but still enforces:

Structured question format
A single correct answer
A valid option set

Validation Process

Cross‑checks answer format – ensures the answer follows the expected pattern.
Ensures only one correct answer is marked.
Optionally re‑prompts the AI if the response is ambiguous.

Programming – Hard

Programming question example

These screens contain advanced questions (time‑complexity, data structures). The AI generates:

Domain‑specific questions
Appropriate difficulty levels
Relevant answer choices

Reliability Measures

Enforces known patterns (e.g., Big‑O notation format)
Validates option consistency
Ensures only one correct answer is selected

Rule‑based validation is also applied where possible, for example:

Valid complexity values (O(n), O(log n), etc.)
Known correct answers for standard problems

Exam Flow

“Start Exam Now” separates setup from execution, providing clear flow control.
Every AI response passes through the validation layer, which:
- Checks structure
- Fixes formatting issues
- Ensures required fields exist
- Guarantees output consistency

Thus, AI suggests content, but the system decides what is actually used.

Showing Answers

When a student clicks “Show Answer”:

The answer is extracted from the AI response.
It is validated for correctness and proper formatting.
For numerical questions, the system re‑checks the calculation before displaying it.

This gives students confidence that the displayed answer is accurate and reliable.

Impact

Adding the validation layer turned a demo‑like prototype into a stable, usable tool:

Outputs are now consistent across runs.
The system behaves predictably, no longer feeling like a “demo”.
The implementation remains intentionally simple: a Java JAR with an in‑memory database that can be run locally.

Call for Collaboration

If you have experience with:

AI‑generated data
Exam or quiz systems
Validation/control layers

I’d love to hear how you tackled similar challenges. Did you refine prompts, or did you introduce a control mechanism?

Project repository:
https://github.com/swapneswarsundarray/ai-assisted-exam

The project is early‑stage and open to contributions. Feel free to try it out, suggest improvements, or add real‑world input.

Takeaway

AI alone doesn’t replace a system; it becomes a component of a larger architecture.
The real value comes from the surrounding structure, validation, and control that make the whole solution reliable.