We validated our COBOL-to-Python engine on 15,552 real-world programs. 98.78% produce valid Python. Zero LLMs involved.

Published: (April 5, 2026 at 12:39 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

The corpus

15,552 COBOL source files – not synthetic benchmarks, but real programs collected from 131 open‑source repositories across five continents:

  • Norway, France, Brazil, India, Japan, USA
  • GitHub, HuggingFace, CBT Tape, GnuCOBOL, IBM public repositories
  • Commercial COBOL, GnuCOBOL extensions, TypeCOBOL, mainframe dialects

No selection bias, no curated samples – everything we could find.

The result

VersionFiles processedValid PythonFailuresNet gain
v5.6 (before)14,50814,020 (96.84 %)
v5.8e (after)15,552 (+1,044)15,362 (98.78 %)456+1,342 files

On the original v5.7 reference corpus the success rate was 99.25 %; 180 of 289 failures were corrected in a single session.

What “valid Python” means

We do not use LLMs, string comparison, or style checks. Validation is performed with ast.parse():

  • If the generated Python parses without raising a SyntaxError, it is considered valid.
  • If a SyntaxError is raised, it fails.

This binary, deterministic check leaves no room for interpretation or hallucination.

What fails and why

190 files still fail. Their categories are:

CategoryApprox. countTypical issues
TypeCOBOL~60Multi‑level qualifications, REPLACE, typed expressions
GnuCOBOL extensions~40GUI, bitwise composition, OO, SCREEN SECTION
Non‑standard COBOL~30WebSocket, brainfuck interpreter, .NET GUI
Deep STRING/UNSTRING~25Exotic mainframe constructs
Exotic mainframe~35Complex nesting, multiple delimiters
CICS inline / EXEC SQLComplex EXEC SQL, nested copybooks

These are not parsing bugs but constructions that sit at the outer boundary of what any standard COBOL parser can handle. The sanitizer cannot fix what the parser never understood. We are actively working on them.

How it works

AGUELLID CODE does not translate COBOL to Python directly. It:

  1. Transforms COBOL into a semantic intermediate representation.
  2. Generates Python that is provably equivalent – behavior‑by‑behavior, not line‑by‑line.

There is no neural network, prompt, or sampling involved. The transformation is deterministic: the same input always yields the same output, which can be audited and traced. No black box.

Why this matters

  • An estimated 220 billion lines of COBOL remain in active production.
  • Most run on systems that organizations can no longer maintain; the original engineers are retired and documentation is incomplete.
  • Modernizing this code is a survival issue for many industries, not a stylistic choice.

Current approaches

  • Manual rewrite – expensive, slow, error‑prone.
  • LLM translation – non‑deterministic, unauditable, high hallucination risk on legacy syntax.
  • Transpilers – brittle, shallow, fail on complex constructs.

AGUELLID CODE offers a deterministic, auditable solution with 98.78 % success on 15,552 real files and zero LLMs.

What comes next

The remaining 190 failures map to specific parser gaps. We are addressing them by gain/risk ratio; some TypeCOBOL patterns alone can recover 20–30 files with a single micro‑patch.

Target: 99.2 %–99.5 % success on the full expanded corpus.

The forge is still burning.


KIVUMIA — AGUELLID CODE v5.8e
Validated: 2026‑04‑05 03:27 UTC
Corpus: 131 sources, 15,552 files, 5 continents
Engine: deterministic, zero LLMs

0 views
Back to Blog

Related posts

Read more »