AAoM-02: XML Parser with W3C Conformance

Published: (January 13, 2026 at 09:32 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Skill

I’m still using Claude Code (Opus 4.5) with the MoonBit system prompt and IDE skill.
Moreover, I created a new skill named moonbit-lang to inform the AI about best practices and common pitfalls for the MoonBit language. The header looks as follows:

---
name: moonbit-lang
description: "MoonBit language reference and coding conventions. Use when writing MoonBit code, asking about syntax, or encountering MoonBit-specific errors. Covers error handling, FFI, async, and common pitfalls."
---

# MoonBit Language Reference

@reference/fundamentals.md
@reference/error-handling.md
@reference/ffi.md
@reference/async-experimental.md
@reference/package.md
@reference/toml-parser-parser.mbt

In this skill doc I also mention the official file‑I/O package moonbitlang/x/fs, which the AI is not familiar with.
The complete skill doc and references can be accessed on GitHub, where I continuously update the skills I use.

The AI (both Codex and Claude) reads only the description at startup and the rest on demand. I keep the skill doc simple because, in my experience, excessively long documents hinder the AI’s ability to understand the details.

Problem

XML remains ubiquitous in configuration files, data interchange, and legacy systems. A conformant XML parser must handle:

  • Element tags, attributes, and namespaces
  • Entity references

Below is a simple test that parses a minimal document and inspects the resulting event stream:

let xml = "\n\n\n"
let reader = Reader::from_string(xml)
let events : Array[Event] = []
for {
  match reader.read_event() {
    Eof => {
      events.push(Eof)
      break
    }
    event => events.push(event)
  }
}
inspect(
  to_libxml_format(events),
  content="[DocType(\"doc\"), Empty({name: \"doc\", attributes: []}), Eof]",
)

A not‑well‑formed example:

test "w3c/not-wf/not_wf_sa_001" {
  // Attribute values must start with attribute names, not "?".
  let xml = "\n\n\n"
  let reader = Reader::from_string(xml)
  let has_error = for {
    try reader.read_event() catch {
      _ => break true
    } noraise {
      Eof => break false
      _ => continue
    }
  }
  inspect(has_error, content="true")
}

A total of 735 tests were generated, comprising ~14 k lines of code. After adding a few manually‑written tests, the suite now contains 800 tests.

Parser Implementation

Since quick‑xml was the initial reference, Claude followed a pull‑parser architecture inspired by it, which I thought was acceptable for our goal. The API looks like this:

let reader = @xml.Reader::from_string(xml)
for {
  match reader.read_event() {
    Eof => break
    Start(elem) => println("Start: \{elem.name}")
    End(name)   => println("End: \{name}")
    Text(content) => println("Text: \{content}")
    _ => continue
  }
}

Because lxml returns a tree while our parser emits events, I asked Claude to implement a to_libxml_format function that transforms our event stream into the exact format produced by lxml. This made test comparison straightforward.

The basic implementation took about 4 hours of AI‑only work (aside from occasional “Please continue” prompts). The most complex feature was DTD parsing and validation. I used Claude’s plan mode to structure the implementation. Below is a summary of that plan:

Plan summary

Project Summary

Diagram

After about 1 hour, DTD support was implemented and 726 tests passed.
It then took another 3 hours to handle edge cases such as:

  • Entity value expansion
  • Text‑splitting details
  • UTF‑8 BOM handling

Results

At the end of the effort 800 W3C conformance tests passed.

  • 59 tests were skipped by the tests‑gen script because:
    • Some were valid but rejected by lxml.
    • Others were not well‑formed but passed by lxml.

These were marked as “lxml implementation quirks”.
Since the edge cases were overly complicated, I didn’t verify each one in detail, but the remaining 800 tests were sufficient for confidence.

Supported Features

  • XML 1.0 + Namespaces 1.0
  • Pull‑parser API for memory‑efficient streaming
  • Writer API for XML generation
  • DTD support with entity expansion

Reflections

What Worked Well?

  • Using an official test suite – The W3C conformance tests uncovered obscure edge cases (character references, DTD quirks, namespace handling, etc.) that I would never have thought to test manually.
  • Switching reference implementationsquick‑xml is intentionally lenient, which made conformance testing difficult. Switching to libxml2 gave me a strict reference.
  • Planning mode for complex features – Breaking DTD parsing into a plan kept the work organized; without it, I would have jumped between unrelated bugs.

Challenges Encountered

Claude often tried to modify the tests instead of fixing the parser:

  • Changing test expectations to match incorrect output.
  • Updating the test generator to skip failing tests.
  • Marking tests as “lenient” and skipping them.

I had to repeatedly remind Claude: “Update the MoonBit implementation, not the tests.”

Other recurring issues:

  • Forgetting project conventions (e.g., not using the moon‑ide skill for navigation, using match (try? expr) instead of try/catch/noraise).
  • Adding these conventions to CLAUDE.md helped but didn’t eliminate the problem.

I found a related discussion on Reddit (link) that suggests a bug in Opus 4.5 and Sonnet 4.5. Hopefully it will be fixed soon.

Future Work

I anticipate needing to implement or port many more parsers. My plan is to turn the experience of writing parsers and generating standard‑based test scripts into reusable skills or commands, so the next project can benefit from this groundwork.

Time Investment (≈ 10 hours)

ActivityHours
Collaborative exploration of test‑generation script2
Autonomous implementation of basic features4
Planning & implementing DTD, namespaces, entities1
Handling edge cases (fixing 17 test failures)3

The code is available on GitHub:

Back to Blog

Related posts

Read more »

Overlapping Markup

Article URL: https://en.wikipedia.org/wiki/Overlapping_markup Comments URL: https://news.ycombinator.com/item?id=46666650 Points: 9 Comments: 1...