[Paper] Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Published: 1 day ago (March 4, 2026 at 12:55 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2603.04337v1

Overview

The paper introduces Pointer‑CAD, a new framework that lets large language models (LLMs) generate and edit full‑featured CAD models. By augmenting the traditional command‑sequence representation with pointer operations that directly select edges, faces, or other B‑rep entities, the system overcomes the long‑standing limitation of “blind” sequence generation and dramatically cuts topological errors caused by discretizing continuous geometry.

Key Contributions

Pointer‑based command language – Extends the usual CAD command stream with explicit “select‑entity” tokens that let the LLM point to a specific face, edge, or vertex in the current B‑rep.
Iterative B‑rep conditioning – Each generation step receives both the natural‑language prompt and the up‑to‑date boundary‑representation, enabling context‑aware edits (e.g., chamfer the selected edge).
Large‑scale annotated dataset – A pipeline that pairs 575 K expert‑level CAD models with high‑quality natural‑language descriptions, providing the training signal needed for pointer prediction.
Quantization‑error mitigation – By selecting existing geometric entities instead of approximating continuous parameters, the method reduces segmentation/topology errors by orders of magnitude compared with prior sequence‑only approaches.
Comprehensive evaluation – Demonstrates reliable generation of complex parts (multiple features, nested operations) and shows near‑zero failure rates on standard CAD benchmarks.

Methodology

Representation – The CAD model is expressed as a command sequence (e.g., sketch, extrude, fillet). Pointer‑CAD adds a new token type SELECT <entity_id> that points to an element in the current B‑rep (edges, faces, vertices).
Model Architecture – A transformer‑based LLM (e.g., GPT‑NeoX) is fine‑tuned to predict the next token given:
- The textual design description.
- A serialized view of the current B‑rep (encoded as a list of entity features).
- The previously generated command tokens.
  The pointer prediction is treated as a classification over the set of available entities.
Training Data Pipeline – Existing CAD repositories are parsed into B‑rep structures, then a semi‑automatic annotator generates paired natural‑language specifications (using GPT‑4 for drafting and human verification). The pipeline also extracts ground‑truth pointers for every operation that involves entity selection.
Inference Loop – Starting from an empty model, the LLM iteratively emits commands. When a SELECT token is produced, the model scores all candidate entities and picks the highest‑scoring one, which is then fed back into the CAD kernel to update the B‑rep before the next step.

Results & Findings

Metric	Pointer‑CAD	Prior Sequence‑Only (e.g., CAD‑GPT)
Topological error rate (invalid B‑rep)	0.3 %	7.8 %
Chamfer/Fillet success on complex parts	94 %	62 %
Average number of features per generated part	12.4	6.1
Human evaluation (design fidelity)	4.6 / 5	3.8 / 5

Error reduction – The pointer mechanism cuts quantization‑induced segmentation errors by ~10×.
Feature richness – Models can reliably chain multiple dependent operations (e.g., sketch → extrude → select face → fillet).
Generalization – On unseen prompts, the system still produces valid B‑reps, indicating that the pointer‑based conditioning learns robust geometric reasoning rather than memorizing fixed command patterns.

Practical Implications

Developer APIs – Pointer‑CAD can be wrapped as a REST service that accepts a natural‑language design brief and returns a standard CAD file (STEP/IGES). This opens the door for “design‑by‑prompt” features in IDE plugins, product configurators, or rapid prototyping tools.
Interactive CAD assistants – Because the model can point to existing geometry, it can be used for in‑situ editing: a user asks “add a 2 mm fillet to the top edge of the bracket,” and the system instantly selects the correct edge and applies the operation.
Reduced manual modeling time – Early experiments suggest a 30‑40 % cut in the number of manual steps required to build complex parts, translating into faster iteration cycles for mechanical engineers and hobbyists alike.
Better downstream simulation – Valid B‑reps mean fewer geometry clean‑up steps before feeding models into finite‑element analysis or 3‑D printing pipelines, improving overall workflow reliability.

Limitations & Future Work

Scalability of entity set – Pointer prediction currently enumerates all faces/edges, which can become costly for very large assemblies; hierarchical or learned indexing could alleviate this.
Dataset bias – The 575 K models are sourced mainly from mechanical parts; architectural or organic shapes may need additional training data.
Fine‑grained parameter control – While pointers eliminate quantization error, continuous parameters (e.g., exact fillet radius) still rely on discretized tokens; future work could integrate differentiable geometry modules to predict real‑valued values.
User intent ambiguity – Vague natural‑language prompts may lead to ambiguous pointer choices; incorporating clarification dialogs or multimodal inputs (sketches, images) is a promising direction.

Pointer‑CAD marks a significant step toward truly intelligent CAD generation, bridging the gap between language understanding and precise geometric manipulation. For developers eager to embed generative design capabilities into their products, the paper offers both a solid technical foundation and a roadmap for practical integration.

Authors

Dacheng Qi
Chenyu Wang
Jingwei Xu
Tianzhe Chu
Zibo Zhao
Wen Liu
Wenrui Ding
Yi Ma
Shenghua Gao

Paper Information

arXiv ID: 2603.04337v1
Categories: cs.CV, cs.CL
Published: March 4, 2026
PDF: Download PDF

[Paper] Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

[Paper] ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

[Paper] MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

[Paper] SimpliHuMoN: Simplifying Human Motion Prediction