Skip to content

infinityabundance/gnucobol-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

532 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

gnucobol-rs

Generated document (TRUST.4.DOCS). Machine authority: reports/claim-ladder.json + reports/casefiles/. Legacy source preserved losslessly under research/legacyreports/README.md.

crates.io license unsafe oracle sealed courts casefiles

gnucobol-rs ports the entire GnuCOBOL 3.2 libcob runtime — all 13 admitted .c files — 1:1 into safe Rust, proven byte-identical to a pinned, locally-built GnuCOBOL 3.2 oracle, and ships a clean-room interpreter (cobrun) that parses and executes real COBOL programs on that runtime. No C is linked. It is not a library with tests; it is a compatibility court where "correct" never means "our reading of a spec" — it means byte-for-byte identical to the admitted cobc (GnuCOBOL) 3.2.0, where every claim is mechanically chained to a replayable receipt, and where every boundary is stated as loudly as every capability.

COBOL's bedrock is its byte layout — COMP-3, zoned decimal, edited PICTURE, fixed-record offsets — far more than its syntax. gnucobol-rs reproduces that bedrock exactly, then runs a verified slice of the language on top of it. The discipline is the product: nothing is asserted because we read the spec that way; a claim is admitted only when a live differential sweep against the built oracle produces identical bytes, and sealed only when its evidence — fixtures, receipt, SARIF, in-toto statement, and explicit non-claims — is committed and mechanically re-derivable. If the bytes would diverge today, the gate goes red and publishing is blocked. The core crate is #![forbid(unsafe_code)].


What it is, in one minute

  • A native-Rust libcob — the GnuCOBOL 3.2 runtime, ported statement-by-statement and oracle-sealed.
  • A turn-key interpreter (cobrun) — feed it a .cob file, it runs, output matches cobc -x byte-for-byte across the corpus sweep. No cobc, no libcob linked.
  • A C-ABI shim (gnucobol-rs-ffi) — drop it in where you would link libcob (cob_move, cob_get_int, …).
  • A compatibility court — 137 sealed courts, each backed by a forensic case file and a one-command replay.

As of gnucobol-rs 0.8.17 (2026-06): 13/13 libcob files ported 1:1 · 110/110 intrinsics in the runtime · 137 sealed courts · MSRV 1.74. The living current-state authority is STATUS.md — when it disagrees with this page, it wins.


The scope, honestly

Three independent parity views cross-check the runtime so a doc-comment can never masquerade as a port — all at 100% / 0 gaps:

  • 13/13 libcob source files ported 1:1 to safe Rust, oracle-sealed — call, cconv, common, cobgetopt, fileio, intrinsic, mlio, move, numeric, reportio, screenio, strings, termio. Not a subset of the runtime. The runtime.
  • Doxygen C-parse view: 998/998 functions.
  • Typed C↔Rust symbol-map view: 1156 active + 13 inactive mirrors, 0 missing.
  • Gap ledger (GAP-ANALYSIS.md): 105 gaps catalogued, all 105 fixed, 0 open.
  • All 110 intrinsic functions ported as cob_intr_*; the front-end evaluates 94 of them byte-identical to cobc.

Live coverage is generated and gated, never asserted by hand: COBOL-PARITY.md enumerates every verb / intrinsic / clause and what actually runs; FILE-PARITY.md accounts for every GnuCOBOL 3.2 source file with no unevidenced gaps.


See it run

A small program, executed by cobrun on the ported Rust runtime — no cobc, no libcob linked:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. DEMO.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01  PRINCIPAL  PIC 9(5)V99 VALUE 1000.00.
       01  RATE       PIC V999    VALUE 0.075.
       01  INTEREST   PIC 9(5)V99.
       01  OUT-LINE   PIC $$$,$$9.99.
       PROCEDURE DIVISION.
           COMPUTE INTEREST ROUNDED = PRINCIPAL * RATE
           MOVE INTEREST TO OUT-LINE
           DISPLAY "INTEREST: " OUT-LINE
           STOP RUN.
$ cobrun demo.cob
INTEREST:    $75.00

That stdout is byte-identical to cobc -x demo.cob && ./demo — proven, not asserted: every program in the front-end corpus is compiled and run through the admitted real cobc and through cobrun, with cmp -s requiring identical stdout and the gate requiring zero failures (FAIL=0). Passing programs are additionally diffed against a second oracle, GnuCOBOL 3.1.2, for version stability.

Prefer the runtime as a library? The same packed-decimal and MOVE semantics are reachable directly — #![forbid(unsafe_code)], no GMP, no libcob, no FFI — or under libcob's own cob_field C ABI via gnucobol-rs-ffi (drop-in cob_move / cob_get_int / cob_set_int, verified byte-identical to real libcob by tests/verify_vs_libcob.sh).


Who this is for

  • Migration & re-platforming teams who need byte-trust, not vibes: a way to read mainframe COMP-3 / zoned / EBCDIC records in Rust and prove the bytes match what the COBOL runtime produces — a COMP-3 sign nibble, a zoned overpunch, an OCCURS DEPENDING ON length, a truncating MOVE. Any one of these silently corrupts money; here every layout decision is judged against the real compiler's output, not a re-reading of the standard.
  • Rust projects that need COBOL data semantics without linking C: PIC arithmetic, edited pictures, MOVE conversions, fixed-record layout — in a #![forbid(unsafe_code)] crate.
  • Auditors & reviewers who want a replayable evidence trail, not assertions: every claim names its oracle, its fixtures, the version that sealed it, and what would break it; one command replays the lot on your machine.

What it is today

Live coverage is generated and gated, never hand-asserted — see COBOL-PARITY.md (every verb / intrinsic / clause and what runs) and FILE-PARITY.md (every GnuCOBOL 3.2 source file, accounted for).

As of gnucobol-rs 0.8.17 (2026-06): 13/13 libcob files ported 1:1 · 110/110 intrinsics in the runtime · 137 sealed courts.

Layer State
libcob runtime 100% ported (13/13 admitted files), oracle-sealed. MOVE · arithmetic · pure-Rust decimal · COMP-3 / zoned / binary · edited PICTURE · DATA DIVISION layout · COPY/REPLACING · OCCURS DEPENDING ON · files · intrinsics — every admitted libcob source file is 1:1 in safe Rust, cross-checked by all three parity views.
Intrinsic functions 110/110 ported as cob_intr_* in the runtime; the front-end evaluates 94/110 (85%) in FUNCTION … references, each byte-identical to the oracle.
Front-end interpreter (cobrun) A clean-room COBOL parser + executor on the ported runtime — no cobc, no libcob linked. Runs a growing, sweep-verified subset (~57 verbs incl. MOVE / arithmetic / COMPUTE with full + - * / ** grammar, parens & precedence, and ROUNDED; IF/EVALUATE; PERFORM TIMES/UNTIL/VARYING; GO TO … DEPENDING; tables; sequential files; 88-levels; numeric-EDITED output; ~110 dispatched intrinsics) to stdout proven byte-identical to cobc across the corpus sweep. Out-of-subset constructs fail closed.
CLI parity cobrun --runtime-config reproduces cobcrun --runtime-config byte-for-byte (native port of libcob's print_runtime_conf, common.c:9762); plus -std=NAME, -fixed/-free, honest --version that identifies as gnucobol-rs reproducing GnuCOBOL 3.2.0 — never masquerading as cobc.
C ABI gnucobol-rs-ffi exposes the ported algorithms under libcob's cob_field C ABI (drop-in cob_move / cob_get_int / cob_set_int / …), verified byte-identical to real libcob.

Of 66 COBOL verbs total, the runtime backs 51 (77%) and the front-end runs 58 (88%) — enumerated live in COBOL-PARITY.md.


What it does NOT claim

State the boundaries as loudly as the capabilities — they are the credibility. The runtime port is complete (100% across all three parity views); the remaining frontier is the interpreter front-end. Crucially, gnucobol-rs distinguishes a latent-work item (a future task) from a BOUNDARY (a surface the GnuCOBOL 3.2 oracle itself cannot run, so there is no byte-truth to match).

  • No native code generation. gnucobol-rs interprets directly on the ported runtime; it does not reproduce cobc's C-emission (codegen.c / codeoptim.c) — a deliberate non-goal, not a gap. cobrun <file> equals cobc -x <file> only at observable stdout, never in artifacts. Correctness is oracle-judged, not self-asserted.
  • The front-end is a verified subset, not the whole grammar. The parser / scanner / preprocessor / typecheck reproduce a sweep-verified slice (byte-identical over the 145-program sweep). Out-of-subset constructs — group items, OCCURS/REDEFINES, non-01 levels, unlisted verbs — return a typed RunError and exit 2, never a silent mis-run.
  • 16 intrinsics not wired into the front-end — classified boundaries, not latent work: 15 have no fixed oracle output (cobc rejects them — DISPLAY-OF, NATIONAL-OF, BOOLEAN-OF-INTEGER, BINOP, STANDARD-COMPARE, separators; MODULE-PATH is a compiled-binary artifact), and RANDOM is a deliberate boundary (it depends on GMP's internal Mersenne-Twister; the port does not link or reproduce libgmp internals).
  • 8 verbs unrun — every one an explicit BOUNDARY the 3.2 oracle itself cannot run (5 COMMUNICATION SECTION verbs, ACUCOBOL GUI INQUIRE/MODIFY, ENTRY in a nested program).
  • cobc's diagnostic / help text is not reproduced byte-for-byte — the interpreter has its own error model (a deliberate scope limit). Localized (non-C.UTF-8) runtime messages are an explicit off-oracle non-claim. cob-config is documented but not provided; the cobcrun dynamic module loader / CALL dispatch reproduce a documented subset.
  • Declared non-goals: cobc's C codegen, and USAGE NATIONAL (UTF-16) — which GnuCOBOL 3.2 itself marks unfinished and won't compile in the explicit form.
  • Independent project. Reproduces GnuCOBOL 3.2; not affiliated with or endorsed by the GNU project. The runtime is a faithful copyleft derivative, not clean-room — see License & derivation below.

The machine-readable registry of every refused surface is reports/negative-capabilities.json (human ledger: docs/negative-capabilities.md). A sealed governance invariant enforces that negative claims must be ≥ positive claims — currently 573 refused surfaces.


How "correct" is defined: the admitted oracle

There is exactly one source of truth: upstream GnuCOBOL 3.2 (cobc + libcob). Because it is not installed system-wide, it is built from pinned source (research/gnucobol-3.2.tar.lz, sha256 recorded in reports/admission/) into a gitignored lab/oracle/prefix — x86-64, C.UTF-8, default dialect, COMP big-endian. Every byte claim in this repo means matches that built binary on a fresh run. The oracle's ABI / dialect / config is itself bound as evidence (GNURUST.BUILD.PROFILE.1). When the oracle is present, the documentation gate re-runs ~20 live differential sweeps and requires FAIL=0 on a fresh run before anything ships.


Verify it yourself (≈ 10 minutes)

One command replays every sealed court and prints a PASS table:

bash lab/verify-sealed-courts.sh      # all differential sweeps + shim suite + doc-gate (needs the built oracle)

No oracle built yet? The self-contained court tests run with zero external dependencies, and oracle-dependent checks degrade to typed skipped — never a silent pass:

cargo test                                  # self-contained court tests, no oracle needed
cargo run -p xtask -- receipt check         # receipts == live replay, no hand-edits

The front-end's own proof harness compiles and runs every program in lab/corpus/frontend/ through both real cobc and cobrun, requiring byte-identical stdout (passing programs are also differentially checked against GnuCOBOL 3.1.2, with documented per-program exemptions):

bash lab/oracle/cobol_frontend_sweep.sh     # 145-program byte sweep, gate FAIL=0

Reviewer entry pointsSTATUS.md (live current-state authority) · COBOL-PARITY.md / FILE-PARITY.md (live language + file coverage) · GAP-ANALYSIS.md (105/105 catalogued gaps fixed, 0 open) · reports/negative-capabilities.json (non-claims) · reports/casefiles/ (137 forensic case files).


Compatibility as a stack of courts

gnucobol-rs treats COBOL compatibility as a stack of separately admitted courts — bytes, moves, the field model, record layout, initialization, comparison, formatting, source expansion, runtime lifecycle, files, intrinsics, and the executing front-end — where no lower layer is allowed to imply a higher one. Passing the decimal-MOVE court says nothing about files; each court stands on its own evidence, naming its byte domain, oracle, fixture count, sealing version, and what would break it. A few marquee courts:

Court What it seals
GNURUST.2 decimal MOVE — COMP-3 / zoned / display byte conversions
GNURUST.3 the PIC field model (incl. P-scaling)
GNURUST.7 / GNURUST.13 arithmetic ADD/SUB/MUL · packed BCD add/subtract
GNURUST.16 / GNURUST.16C edited-PICTURE decode and encode
GNURUST.FILEIO.* sequential / relative / indexed / SORT runtime I-O (INDEXED on a pure-safe-Rust B-tree)
GNURUST.INTRINSIC.* LENGTH, NUMVAL(-C), INTEGER, MOD/REM, CASE, ORD/CHAR, dates
GNURUST.LINEAGE.CORPUS.20M.1 a completed 20M real-cobc COBOL-witness lineage run
GNURUST.FRONTEND.1 the clean-room front-end: parse + execute a subset to cobc-identical stdout

Of the 137 sealed courts, 101 are GNURUST.* (the open LGPL runtime + front-end layer); 31 are KOBOLD.*; the remainder are framework courts. The machine-readable form is reports/claim-ladder.json, and each court's full forensic record (casefile.json + SARIF 2.1.0 + in-toto v1 + DSSE envelope) lives under reports/casefiles/.

The full 137-court ledger is in docs/sealed-courts.md.

The KOBOLD.* courts belong to downstream, independently-written Apache-2.0 crates that use this runtime (operator trust layer, fixed-record reconciliation, banking packets). They ship and are documented in their own repositories, not here.


Breadth of verification

  • 106 differential oracle sweep scripts (lab/oracle/*sweep*.sh), each byte-for-byte vs the admitted GnuCOBOL 3.2, spanning arithmetic/numeric, edited/PICTURE/encoding, data movement/tables, control flow, files/I-O, strings, screen I/O (9 native terminal-byte courts), intrinsics/dates, and CALL/interop.
  • Corpus (lab/corpus/): ~679 real COBOL programs across 7 subdirectories plus the 4.3 MB NIST COBOL-85 validation suite (newcob.val.Z, held under the GNURUST.CCVS85.1 custody gate) — including 533 programs from a public GnuCOBOL test corpus, 53 from an open banking suite, and the 145 hand-authored front-end programs.
  • Three independent parity maps cross-check so a doc-comment can never masquerade as a port: DOXYGEN-PARITY (998/998 fns), LIBCOB-PARITY / PORT-INDEX (typed C↔Rust symbols, 100% active), and CLANG-AST-PARITY (870 defs / 3240 call edges). See COBOL-PARITY.md, FILE-PARITY.md, DOXYGEN-PARITY.md, LIBCOB-PARITY.md, CLANG-AST-PARITY.md, FUNCTION-EVIDENCE.md, and the 0–7 PORTING-LADDER.md (level 7 = compiler replacement, explicitly NOT CLAIMED; level is evidence shape, not quality).

Crates

Crate Derives from License Scope unsafe
gnucobol-rs libcob/*.c (move, numeric, common, intrinsic, fileio, …) LGPL-3.0-or-later the native-Rust libcob runtime (1:1, oracle-sealed) + the cobrun interpreter front-end forbid
gnucobol-rs-ffi libcob common.h C ABI LGPL-3.0-or-later a cob_field C-ABI shim — link it where you would link libcob allow (confined to the raw-pointer shim)
cobc-oracle-rs drives cobc (no GPL code copied) GPL-3.0-or-later spawns cobc, captures deterministic canonical-JSON receipts forbid

The runtime additionally depends on gnucobol-rs-bdb-format (a pure-safe-Rust Berkeley DB B-tree, for INDEXED organization).


License & derivation boundary

Legal skim, in one line: the runtime is a faithful LGPL-3.0-or-later derivative of libcob — ported statement-by-statement with upstream line citations (e.g. // move.c:477), so it is not clean-room and its license was not freely chosen; the cobrun front-end is clean-room but ships inside the LGPL crate; the cobc-driving tooling is GPL-3.0-or-later.

Because the runtime inherits upstream copyleft, a distributed binary that statically links the LGPL core is a Combined Work under LGPL-3.0 §4. The FSF copyright and original-author credits (Nishida, While, Sobisch, et al.) are retained in every ported file header; COPYING.LESSER (LGPL v3) and COPYING (GPL v3) are shipped. Do not describe the runtime as "clean-room" — the authoritative statement is docs/derivation-and-license.md, with the boundary detailed in docs/license-boundaries.md. The permissive downstream satellites (kobold-*, Apache-2.0) are independently written and merely use the core.

This is an independent effort and is not the upstream GnuCOBOL project, nor endorsed by it.


Project status, features & MSRV

  • Feature flags never change semantics — they gate optional surfaces (e.g. parallelism), never the bytes a court seals.
  • Workspace: resolver 2, MSRV 1.74 (inherited), workspace lint unsafe_code = "forbid", [profile.release] overflow-checks = true.
  • The runtime port is complete across all three parity views; active development is in the front-end interpreter (the 25 PARTIAL files under cobc/ + bin/), tracked live in COBOL-PARITY.md and bounded by the non-claims above.

Method

Admit pinned source → read it → build the real upstream as an executable oracle → port faithfully with citations → prove byte parity over a fixture matrix + differential sweep → pin or classify every confounder → Kani the sharp invariants → fuzz the hostile surface → gate → seal with receipts and exact non-claims. Each byte court carries both a Kani proof and a fuzz target (gate-enforced), plus a forensic case file. A documentation refresh gate (lab/check-docs.sh, cargo run -p xtask -- docs check) runs alongside fmt/clippy/test/sweep and fails if any doc, receipt, or coverage map drifts from the code or the oracle — so nothing goes stale as the register grows. Depth on method, taxonomy, and the risk register lives in docs/.


Repository map

Map What it answers
STATUS.md live current-state authority (it wins on any disagreement)
COBOL-PARITY.md every verb / intrinsic / clause and exactly what runs
FILE-PARITY.md every GnuCOBOL 3.2 source file accounted for, 0 unevidenced gaps
GAP-ANALYSIS.md the catalogued C→Rust gap ledger (105/105 fixed, 0 open)
DOXYGEN-PARITY.md the Doxygen C-parse function-parity view (998/998)
LIBCOB-PARITY.md / CLANG-AST-PARITY.md typed C↔Rust symbol parity · clang-AST def/call-edge parity
PORTING-LADDER.md the 0–7 evidence hierarchy (level 7 explicitly NOT CLAIMED)
docs/ method, taxonomy, derivation/license, risk register, planning

Generated document — DO NOT EDIT BY HAND. Rendered by xtask from docs-src/README.model.json and machine evidence (reports/claim-ladder.json, reports/casefiles/, Cargo.toml, live receipts). Regenerate with cargo run -p xtask -- docs generate; verify freshness with cargo run -p xtask -- docs check. The prior hand-written README is preserved losslessly under research/legacyreports/README.md; its claims are carried forward in full by this generated body. Counts are dated and version-anchored to STATUS.md, the living authority.

Packages

 
 
 

Contributors