[Paper] Extractive summarization on a CMOS Ising machine

Published: 3 weeks ago (January 16, 2026 at 01:14 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.11491v1

Overview

Extractive summarization (ES) picks the most important sentences from a document to create a concise summary. This paper shows how a low‑power CMOS‑based Ising machine—an analog hardware accelerator that solves combinatorial optimization problems—can run a state‑of‑the‑art ES algorithm with dramatically lower energy use and comparable speed to conventional CPU/GPU approaches, opening the door to real‑time summarization on edge devices.

Key Contributions

Hardware‑aware Ising formulation that balances local fields and couplings, making the ES problem tolerant to the limited precision of integer‑only spin interactions.
Stochastic rounding & iterative refinement pipeline that recovers accuracy lost during coefficient quantization.
Problem decomposition strategy that splits large ES instances into smaller sub‑problems solvable on the CMOS Ising chip and then recombines the partial solutions.
Empirical validation on the CNN/DailyMail benchmark demonstrating 3–4.5× speed‑up over brute‑force search, energy savings of 2–3 orders of magnitude, and summary quality on par with software‑based Tabu search.

Methodology

Mapping ES to an Ising model – The classic McDonald ES objective (maximize relevance, minimize redundancy) is expressed as a quadratic unconstrained binary optimization (QUBO) problem. Each binary variable indicates whether a sentence is selected.
Coefficient scaling – The authors introduce a scaling trick that reduces the disparity between the “local field” terms (sentence relevance) and the pairwise coupling terms (redundancy). This makes the integer‑only hardware less sensitive to rounding errors.
Stochastic rounding – Instead of deterministic truncation, each real‑valued coefficient is probabilistically rounded to the nearest integer, preserving the expected value of the original model.
Iterative refinement – After the Ising solver returns a candidate summary, a lightweight post‑processing step re‑evaluates the objective and flips a few bits if it improves the score.
Decomposition – For documents with many sentences, the full QUBO would exceed the chip’s capacity. The pipeline partitions the sentence set into overlapping windows, solves each window independently on the hardware, and merges the results using a greedy selection that respects the global budget (k sentences).

Results & Findings

Metric	COBI (proposed)	Brute‑force	Software Tabu search
Runtime (relative)	1×	3–4.5× slower	≈1× (similar)
Energy consumption	↓ 10⁻²–10⁻³ J	Baseline	Baseline
ROUGE‑1/2/L scores*	0.38 / 0.16 / 0.34	0.40 / 0.17 / 0.35	0.39 / 0.16 / 0.35

*Scores are on the CNN/DailyMail test set; differences are within the typical variance of ES models.

The hardware‑aware formulation and stochastic rounding keep the solution quality within a few percentage points of the software baseline while delivering 3–4.5× faster inference and 100–1000× lower energy.

Practical Implications

Edge summarization – Mobile phones, IoT gateways, or autonomous robots can generate news briefs, incident reports, or log summaries without offloading data to the cloud, preserving privacy and reducing latency.
Energy‑constrained deployments – Battery‑operated devices (e.g., wearables, drones) can run NLP pipelines that were previously limited to cloud inference because the CMOS Ising engine consumes milliwatts instead of watts.
Accelerated combinatorial NLP – The same hardware‑aware Ising encoding can be reused for other selection‑type tasks (keyword extraction, document clustering, feature selection), providing a reusable accelerator block for a family of low‑power AI workloads.
Hybrid AI stacks – Developers can keep heavy language model inference on GPUs while delegating the lightweight combinatorial post‑processing (e.g., sentence selection) to an on‑chip Ising solver, achieving a balanced compute‑energy trade‑off.

Limitations & Future Work

Scalability bound by chip size – The current COBI prototype handles up to a few dozen sentences per sub‑problem; larger documents still require aggressive decomposition, which may introduce sub‑optimality.
Precision constraints – Although stochastic rounding mitigates quantization loss, the approach still depends on careful coefficient scaling; extending to mixed‑precision or floating‑point couplings could improve robustness.
Generalization to abstractive models – The study focuses on extractive summarization; applying Ising‑based optimization to end‑to‑end generative summarizers remains an open challenge.
Hardware availability – CMOS Ising machines are still emerging; broader adoption will need standardized APIs and integration with existing ML frameworks.

Overall, the paper demonstrates that analog combinatorial hardware can meaningfully accelerate a core NLP task, paving the way for ultra‑low‑power AI at the edge.

Authors

Ziqing Zeng
Abhimanyu Kumar
Chris H. Kim
Ulya R. Karpuzcu
Sachin S. Sapatnekar

Paper Information

arXiv ID: 2601.11491v1
Categories: cs.LG, cs.ET
Published: January 16, 2026
PDF: Download PDF

[Paper] Extractive summarization on a CMOS Ising machine

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management