[Paper] GIST: Gauge-Invariant Spectral Transformers for Scalable Graph Neural Operators

Published: (March 17, 2026 at 01:54 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.16849v1

Overview

The paper introduces GIST (Gauge‑Invariant Spectral Transformers), a new graph‑transformer architecture that lets you feed meshes or irregular graphs into a transformer‑style model without the usual costly eigendecompositions or loss of gauge invariance (i.e., the model’s predictions staying consistent across different numerical representations of the same underlying geometry). By combining random projections with an inner‑product‑based attention mechanism, GIST runs in linear time (O(N)) and can be trained on one mesh resolution and deployed on another—an attractive property for neural‑operator tasks such as fluid‑dynamics simulation.

Key Contributions

  • Linear‑time spectral attention: Replaces exact spectral transforms (cubic‑time eigendecompositions) with random projections, achieving true (O(N)) complexity.
  • Algorithmic gauge invariance: Guarantees that the attention computation is invariant to the choice of basis (the “gauge”), eliminating catastrophic failures when switching discretizations.
  • Theoretical guarantees: Provides a bound on the mismatch error between different mesh discretizations, proving discretization‑invariant learning.
  • State‑of‑the‑art performance: Matches or exceeds leading GNNs on classic graph benchmarks (e.g., 99.50 % micro‑F1 on PPI) and sets new records on large‑scale neural‑operator datasets (DrivAerNet / DrivAerNet++ with up to 750 k nodes).
  • Scalable neural‑operator pipeline: Demonstrates that a single set of learned parameters can be transferred across meshes of varying resolution, a long‑standing hurdle for physics‑informed ML.

Methodology

  1. Random spectral projection:

    • Instead of computing the full Laplacian eigenbasis, GIST draws a set of random Gaussian vectors and projects node features onto this low‑dimensional subspace.
    • The projection preserves inner products in expectation, which is sufficient for attention because the transformer only needs relative similarities between tokens.
  2. Inner‑product‑based attention:

    • After projection, attention scores are computed as simple dot products (no softmax over eigenvalues).
    • Because dot products are invariant to orthogonal transformations of the basis, the resulting attention is gauge‑invariant by construction.
  3. Linear‑time implementation:

    • Both the projection and the attention can be expressed as sparse matrix‑vector multiplications, yielding an overall (\mathcal{O}(N)) cost per layer.
  4. Training & transfer:

    • The model is trained end‑to‑end on a source mesh (or graph).
    • At inference time, the same learned weights are applied to a target mesh with a different resolution; the random projection adapts automatically, preserving performance.

The authors back the design with a formal proof that the expected attention matrix under random projection deviates from the exact spectral attention by at most a bounded error term that shrinks with the projection dimension.

Results & Findings

BenchmarkMetricGISTPrior SOTA
PPI (protein‑protein interaction)micro‑F199.50 %98.9 %
ZINC (molecular property)MAE0.0890.092
DrivAerNet (aerodynamic pressure field)RMSE0.0120.018
DrivAerNet++ (750 k nodes)RMSE0.0140.023
  • Scalability: GIST processes meshes with up to 750 k nodes on a single GPU (16 GB) without resorting to patching or hierarchical pooling.
  • Generalization across discretizations: When the same model is evaluated on a coarser/finer mesh of the same geometry, performance drops by less than 1 %—a stark contrast to conventional spectral GNNs, which can fail completely.
  • Ablation studies: Removing the random projection or using a standard softmax attention breaks gauge invariance and leads to >20 % performance loss on the neural‑operator tasks.

Practical Implications

  • Neural operators for engineering simulations: Engineers can now train a surrogate model on a cheap, coarse mesh and deploy it on high‑resolution CFD meshes, cutting down simulation time dramatically.
  • Cross‑domain transfer: Because gauge invariance removes dependence on a specific discretization, the same model can be reused across different CAD tools, meshing libraries, or even point‑cloud representations.
  • Edge‑device deployment: Linear‑time attention and the absence of heavy eigendecompositions make GIST viable on resource‑constrained hardware (e.g., on‑board inference for autonomous drones).
  • Simplified pipelines: No need to store or recompute spectral bases for each new geometry; the random projection can be generated on the fly, streamlining data‑preprocessing.

Limitations & Future Work

  • Projection dimension trade‑off: While the theory guarantees bounded error, in practice a larger random projection dimension improves accuracy at the cost of memory; finding the sweet spot for ultra‑large meshes remains an engineering challenge.
  • Numerical stability on extreme meshes: Very irregular or highly anisotropic meshes can still introduce conditioning issues that affect the random projection quality.
  • Extension to dynamic graphs: The current formulation assumes a static graph/mesh; adapting GIST to time‑varying topologies (e.g., moving meshes) is left for future research.
  • Broader benchmark coverage: The authors note that testing on non‑physics domains (e.g., social networks) would help assess the generality of gauge invariance beyond geometric data.

Authors

  • Mattia Rigotti
  • Nicholas Thumiger
  • Thomas Frick

Paper Information

  • arXiv ID: 2603.16849v1
  • Categories: cs.LG
  • Published: March 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »