[Paper] Translating Large-Scale C Repositories to Idiomatic Rust

Published: (November 25, 2025 at 01:42 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.20617v1

Overview

The paper introduces Rustine, an end‑to‑end pipeline that automatically translates whole C codebases into idiomatic, safe Rust. By combining lightweight static analysis with targeted code generation, Rustine bridges the gap between the scalability of simple transpilers and the high quality of hand‑crafted Rust rewrites—making large‑scale migration feasible for real‑world projects.

Key Contributions

  • Fully automated repository‑level translation that works on C projects ranging from a few dozen to over ten thousand lines of code.
  • High functional equivalence: 87 % of test assertions pass on average (1,063,099 / 1,221,192) with 74.7 % function‑level and 72.2 % line‑level coverage.
  • Safety‑first output: generated Rust contains markedly fewer raw pointers, pointer arithmetic, and unsafe blocks than prior tools.
  • Idiomatic Rust: translations score better on Rust linter (clippy) checks, producing code that feels native to Rust developers.
  • Developer‑in‑the‑loop debugging: when full test‑suite equivalence isn’t achieved, engineers can finish the migration in ~4.5 h using Rustine’s diagnostics as a guide.
  • Comprehensive empirical evaluation against six existing C‑to‑Rust translators on 23 diverse programs.

Methodology

Rustine’s pipeline consists of three loosely coupled stages:

  1. Static C analysis & abstraction – a lightweight front‑end parses the C AST, extracts type information, and builds an intermediate representation (IR) that captures memory layout, control flow, and API boundaries without invoking heavyweight LLMs.
  2. Pattern‑driven Rust synthesis – the IR is fed to a rule‑based generator that maps C constructs to their Rust equivalents. The generator prefers safe abstractions (e.g., slices, Vec, Option) and inserts unsafe only when an operation cannot be expressed safely (e.g., FFI).
  3. Post‑generation refinement – the raw Rust output is run through Clippy and rustfmt, and a test‑driven feedback loop automatically rewrites failing snippets (e.g., by adjusting lifetimes or replacing pointer arithmetic with safe iterator patterns).

The whole process runs on commodity hardware; no large language model is required, keeping costs low while still producing compilable code.

Results & Findings

  • Compilation success: every translated repository compiled without manual edits.
  • Functional equivalence: on average 87 % of the original test suite assertions held, with coverage numbers comparable to the original C code.
  • Safety metrics: Rustine’s output reduced raw pointer usage by ~68 % and eliminated most pointer arithmetic, cutting the number of unsafe blocks by more than half relative to the best prior tool.
  • Idiomaticity: Clippy warnings dropped from an average of 12 per 1 k lines (baseline tools) to 3 per 1 k lines in Rustine’s output.
  • Human effort: when the automated pass failed, developers used Rustine’s diagnostics to locate and fix the remaining issues in roughly 4.5 hours—a dramatic reduction compared with a full manual rewrite that can take weeks.

Overall, Rustine outperformed six existing repository‑level translators across safety, readability, and functional correctness.

Practical Implications

  • Accelerated migration: Companies with legacy C libraries (e.g., networking stacks, embedded drivers) can now generate a solid Rust baseline in hours rather than months, lowering the barrier to adopt Rust’s memory safety guarantees.
  • Cost‑effective security hardening: Since Rustine avoids expensive LLM inference, it can be integrated into CI pipelines for continuous “Rust‑ification” of new C contributions, helping teams catch unsafe patterns early.
  • Better onboarding: The idiomatic Rust output is easier for Rust developers to read and maintain, reducing the learning curve for teams transitioning from C.
  • Toolchain extensibility: Rustine’s rule‑based generator can be customized for domain‑specific APIs (e.g., OpenGL, POSIX), enabling targeted migrations without rewriting the whole pipeline.
  • Open‑source potential: By releasing the pipeline, the community could build a shared repository of translation patterns, further improving coverage for obscure C idioms.

Limitations & Future Work

  • Partial functional equivalence: 13 % of test assertions still fail on average, mainly due to undefined‑behavior edge cases that the static analysis cannot fully model.
  • Limited handling of complex macro‑heavy code: While Rustine parses most macros, heavily metaprogrammed C (e.g., Linux kernel style) may require manual intervention.
  • No automatic performance tuning: The generated Rust is safe but not always optimal; future work could integrate profiling‑guided rewrites to match or exceed the original C performance.
  • Extending to other languages: The authors plan to explore applying the same pipeline to C++ and Objective‑C codebases, which would broaden Rustine’s applicability.

Despite these gaps, Rustine demonstrates that large‑scale, automated C‑to‑Rust migration is both technically viable and practically valuable for modern software engineering teams.

Authors

  • Saman Dehghan
  • Tianran Sun
  • Tianxiang Wu
  • Zihan Li
  • Reyhaneh Jabbarvand

Paper Information

  • arXiv ID: 2511.20617v1
  • Categories: cs.SE, cs.PL
  • Published: November 25, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »