[Paper] PSALM: applying Proportional SAmpLing strategy in Metamorphic testing

Published: (December 15, 2025 at 10:04 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.13414v1

Overview

Metamorphic testing (MT) sidesteps the classic “oracle problem” by checking whether related test executions obey predefined metamorphic relations (MRs). This paper introduces PSALM, a Proportional SAmpLing strategy that adapts the well‑known Proportional Sampling Strategy (PSS) to the dual‑selection problem in MT: picking source test cases and forming metamorphic groups (MGs). The authors prove that PSALM never performs worse than random selection and show, through a large empirical study, that it often outperforms state‑of‑the‑art MT selectors such as ART and MT‑ART.

Key Contributions

  • Formal adaptation of PSS to MT – defines a proportional sampling scheme that works simultaneously for source test case selection and MG construction.
  • Theoretical guarantees – proves PSALM is never inferior to random sampling regardless of how the test domain is partitioned, and identifies conditions where source‑case and MG selection have identical effectiveness.
  • Comprehensive empirical evaluation – 8 real‑world programs, 184 seeded mutants, and comparison against ART/MT‑ART, confirming the theoretical advantages.
  • Practical selection algorithm – provides a concrete, easy‑to‑implement procedure that can be plugged into existing MT pipelines.

Methodology

  1. Problem Formalization – The authors model MT as two linked sampling problems: (a) selecting a set of source inputs, and (b) grouping each source with its follow‑up inputs to form MGs.
  2. Proportional Sampling Extension – Starting from classic PSS (which samples proportionally to failure likelihood across partitions), they redesign the probability distribution to reflect the joint failure space of source cases and their MGs.
  3. Proof Sketch – Using combinatorial arguments, they demonstrate that for any partition of the input space, the expected fault‑detection rate of PSALM ≥ that of uniform random sampling.
  4. Empirical Setup
    • Subjects: 8 open‑source Java programs (e.g., JFreeChart, Commons‑Math).
    • Mutants: 184 mutants generated with PIT, representing realistic faults.
    • Baselines: Random selection, Adaptive Random Testing (ART), and MT‑ART (the MT‑specific variant of ART).
    • Metrics: Fault detection rate, number of test executions needed to expose a mutant, and runtime overhead.

Results & Findings

MetricPSALM vs. RandomPSALM vs. ARTPSALM vs. MT‑ART
Fault detection rate+8 % on average (statistically significant)+5 %+6 %
Tests to first fault12 % fewer tests needed9 % fewer10 % fewer
Runtime overheadNegligible (< 2 % extra)ComparableComparable
  • The theoretical advantage of PSALM manifested consistently across all subjects.
  • In cases where the source‑case and MG partitions aligned (the “identical effectiveness” condition), PSALM’s benefit over ART vanished, confirming the authors’ analytical prediction.
  • The overhead of computing proportional probabilities was minimal, making PSALM practical for large test suites.

Practical Implications

  • Plug‑and‑play for MT frameworks – PSALM can replace the default random selector in tools like MetamorphicTest or EvoSuite with a drop‑in module.
  • Higher fault‑detection efficiency – Developers can achieve the same coverage with fewer test executions, saving CI time and compute resources.
  • Better ROI on MR engineering – Since MT already requires effort to craft high‑quality MRs, PSALM maximizes the payoff of each MR by smarter test selection.
  • Scalable to large codebases – The low computational cost means PSALM is suitable for continuous‑integration pipelines that run thousands of MT cases nightly.

Limitations & Future Work

  • Assumption of known partitions – PSALM’s theoretical guarantee relies on a reasonable partitioning of the input space; poorly chosen partitions may dilute its advantage.
  • Focus on Java mutants – The empirical study is limited to Java programs and PIT mutants; cross‑language validation remains open.
  • Static proportional model – The current implementation uses a static probability distribution; future work could explore dynamic, data‑driven updates as test results accumulate.
  • Integration with MR generation – The authors note that coupling PSALM with automated MR synthesis could further boost MT effectiveness, a promising direction for follow‑up research.

Authors

  • Zenghui Zhou
  • Pak-Lok Poon
  • Zheng Zheng
  • Xiao-Yi Zhang

Paper Information

  • arXiv ID: 2512.13414v1
  • Categories: cs.SE
  • Published: December 15, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »