We validated our COBOL-to-Python engine on 15,552 real-world programs. 98.78% produce valid Python. Zero LLMs involved.

Published: 1 month ago (April 5, 2026 at 12:39 AM EDT)

3 min read

Source: Dev.to

Source: Dev.to

The corpus

15,552 COBOL source files – not synthetic benchmarks, but real programs collected from 131 open‑source repositories across five continents:

Norway, France, Brazil, India, Japan, USA
GitHub, HuggingFace, CBT Tape, GnuCOBOL, IBM public repositories
Commercial COBOL, GnuCOBOL extensions, TypeCOBOL, mainframe dialects

No selection bias, no curated samples – everything we could find.

The result

Version	Files processed	Valid Python	Failures	Net gain
v5.6 (before)	14,508	14,020 (96.84 %)	–	–
v5.8e (after)	15,552 (+1,044)	15,362 (98.78 %)	456	+1,342 files

On the original v5.7 reference corpus the success rate was 99.25 %; 180 of 289 failures were corrected in a single session.

What “valid Python” means

We do not use LLMs, string comparison, or style checks. Validation is performed with ast.parse():

If the generated Python parses without raising a SyntaxError, it is considered valid.
If a SyntaxError is raised, it fails.

This binary, deterministic check leaves no room for interpretation or hallucination.

What fails and why

190 files still fail. Their categories are:

Category	Approx. count	Typical issues
TypeCOBOL	~60	Multi‑level qualifications, `REPLACE`, typed expressions
GnuCOBOL extensions	~40	GUI, bitwise composition, OO, `SCREEN SECTION`
Non‑standard COBOL	~30	WebSocket, brainfuck interpreter, .NET GUI
Deep STRING/UNSTRING	~25	Exotic mainframe constructs
Exotic mainframe	~35	Complex nesting, multiple delimiters
CICS inline / EXEC SQL	–	Complex `EXEC SQL`, nested copybooks

These are not parsing bugs but constructions that sit at the outer boundary of what any standard COBOL parser can handle. The sanitizer cannot fix what the parser never understood. We are actively working on them.

How it works

AGUELLID CODE does not translate COBOL to Python directly. It:

Transforms COBOL into a semantic intermediate representation.
Generates Python that is provably equivalent – behavior‑by‑behavior, not line‑by‑line.

There is no neural network, prompt, or sampling involved. The transformation is deterministic: the same input always yields the same output, which can be audited and traced. No black box.

Why this matters

An estimated 220 billion lines of COBOL remain in active production.
Most run on systems that organizations can no longer maintain; the original engineers are retired and documentation is incomplete.
Modernizing this code is a survival issue for many industries, not a stylistic choice.

Current approaches

Manual rewrite – expensive, slow, error‑prone.
LLM translation – non‑deterministic, unauditable, high hallucination risk on legacy syntax.
Transpilers – brittle, shallow, fail on complex constructs.

AGUELLID CODE offers a deterministic, auditable solution with 98.78 % success on 15,552 real files and zero LLMs.

What comes next

The remaining 190 failures map to specific parser gaps. We are addressing them by gain/risk ratio; some TypeCOBOL patterns alone can recover 20–30 files with a single micro‑patch.

Target: 99.2 %–99.5 % success on the full expanded corpus.

The forge is still burning.

KIVUMIA — AGUELLID CODE v5.8e
Validated: 2026‑04‑05 03:27 UTC
Corpus: 131 sources, 15,552 files, 5 continents
Engine: deterministic, zero LLMs

We validated our COBOL-to-Python engine on 15,552 real-world programs. 98.78% produce valid Python. Zero LLMs involved.

The corpus

The result

What “valid Python” means

What fails and why

How it works

Why this matters

What comes next

Related posts

I Got Tired of Hunting Screenshot Paths in Terminals. So I Fixed Ctrl+V.

Building Your First MCP Server: TypeScript vs. Python

Why I Built pip-size: A Story About Obsession with Performance

Show HN: Ghost Pepper – 100% local hold-to-talk speech-to-text for macOS