[Paper] Symmaries: Automatic Inference of Formal Security Summaries for Java Programs

Published: 1 month ago (December 23, 2025 at 09:33 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20396v1

Overview

The paper presents Symmaries, a tool that automatically extracts formal security summaries from compiled Java bytecode. These summaries capture, in a compact form, the conditions under which a method can be safely called and how it may propagate or alter sensitive data. By turning low‑level bytecode into high‑level security contracts, the authors aim to give developers and static analysis tools a reliable way to reason about the security impact of third‑party libraries and large codebases.

Key Contributions

Automated generation of method‑level security summaries for Java bytecode, without requiring source code or manual annotations.
A modular, sound analysis framework that guarantees termination‑insensitive non‑interference (i.e., no illicit information leaks) for the inferred summaries.
Scalable implementation (Symmaries) that successfully processes real‑world Java applications with hundreds of thousands of lines of code.
Empirical evaluation on popular Java APIs showing that the approach yields useful, precise specifications across different heap abstraction models.
Integration potential: summaries can be fed directly into existing static analysis pipelines or used as documentation for developers reviewing library code.

Methodology

Bytecode Extraction – The tool parses Java class files to build a control‑flow graph for each method.
Abstract Interpretation – Using a configurable heap model (e.g., points‑to or shape abstraction), Symmaries performs a forward data‑flow analysis that tracks:
- Pre‑conditions: what security labels (e.g., “confidential”, “public”) must hold on inputs for the method to be considered safe.
- Information‑flow effects: how data may flow from inputs to outputs or to the heap (potential leaks).
- Aliasing updates: changes to object references that could affect later accesses.
Summary Synthesis – The analysis results are collapsed into a concise summary per method, expressed as logical predicates over security labels and heap relations.
Soundness Proof – The authors formalize the analysis and prove that any program respecting the generated summaries satisfies termination‑insensitive non‑interference, i.e., no secret data can influence public outputs.
Tool Integration – Summaries are emitted in a machine‑readable format that can be consumed by other static analysis tools (e.g., taint trackers, model checkers).

Results & Findings

Scalability: Symmaries processed several open‑source Java projects ranging from ~10 K to >300 K lines of code in under a few minutes per module, demonstrating linear‑ish growth with code size.
Precision: Depending on the heap abstraction, the tool achieved 70‑85 % precision in identifying true information‑flow paths while keeping false positives low enough to be practical.
Coverage: Applied to core Java libraries (e.g., java.io, java.net), Symmaries produced usable security contracts for over 90 % of public methods, revealing undocumented assumptions (e.g., required permissions).
Soundness Validation: The formal proof, complemented by extensive testing on benchmark suites, confirmed that any violation of the generated summaries would correspond to a genuine non‑interference breach.

Practical Implications

Library Vetting: Security teams can automatically generate contracts for third‑party JARs, making it easier to assess whether a library meets an organization’s data‑flow policies before inclusion.
Static Analyzer Boost: Existing tools (e.g., SpotBugs, FindSecurityBugs) can ingest Symmaries summaries to prune infeasible paths, reducing analysis time and false‑positive rates.
Developer Documentation: Summaries serve as concise, machine‑verified documentation of a method’s security expectations, helping developers understand hidden side‑effects when reusing code.
Continuous Integration: Integrating Symmaries into CI pipelines can flag newly introduced methods that violate established security contracts, enabling early remediation.
Policy Enforcement: Enterprises can define high‑level security policies (e.g., “no secret data may leave the crypto package”) and automatically verify compliance across the codebase using the generated summaries.

Limitations & Future Work

Heap Model Trade‑offs: The precision of summaries heavily depends on the chosen heap abstraction; more expressive models increase analysis cost.
Termination‑Insensitive Focus: The current soundness guarantee does not cover termination‑based leaks (e.g., timing channels), which may be relevant for certain high‑assurance systems.
Dynamic Features: Reflection, dynamic class loading, and native method calls are only partially handled, potentially limiting coverage for frameworks that rely heavily on these mechanisms.
User‑Defined Policies: While the tool produces generic security contracts, mapping them to organization‑specific policy languages remains an open integration step.

Future research directions include extending the analysis to capture termination‑sensitive non‑interference, improving support for reflective code, and building a higher‑level policy language that can directly consume Symmaries summaries for automated compliance checking.

Authors

Narges Khakpour
Nicolas Berthier

Paper Information

arXiv ID: 2512.20396v1
Categories: cs.CR, cs.FL, cs.PL, cs.SE
Published: December 23, 2025
PDF: Download PDF

[Paper] Symmaries: Automatic Inference of Formal Security Summaries for Java Programs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] HALF: Process Hollowing Analysis Framework for Binary Programs with the Assistance of Kernel Modules

[Paper] Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development

[Paper] A Story About Cohesion and Separation: Label-Free Metric for Log Parser Evaluation

[Paper] The State of the SBOM Tool Ecosystems: A Comparative Analysis of SPDX and CycloneDX