[Paper] Bug Priority Change Prediction: An Exploratory Study on Apache Software

Published: (December 9, 2025 at 07:59 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.09216v1

Overview

This paper tackles a surprisingly overlooked problem in modern issue‑tracking: predicting when a bug’s priority will change during its lifecycle. By mining data from 32 Apache projects, the authors show that features derived from the bug‑fixing process—combined with smart handling of class imbalance—can forecast priority shifts with promising accuracy, opening the door to more proactive triage and resource allocation.

Key Contributions

  • Two‑phase prediction framework that separates the bug reporting and bug fixing stages, training dedicated models for each.
  • Bug‑fixing evolution features (e.g., comment frequency, developer activity, code change metrics) that capture how a bug’s context evolves over time.
  • Class‑imbalance mitigation strategy (oversampling + cost‑sensitive learning) tailored to the heavily skewed distribution of priority changes.
  • Extensive empirical evaluation on a curated dataset of > 200 k bug reports from 32 Apache projects, reporting F1‑scores up to 0.80.
  • Cross‑project analysis revealing how well models trained on one project transfer to others and how performance varies across priority levels.

Methodology

  1. Data Collection & Labeling

    • Extracted bug reports from Apache JIRA, focusing on non‑trivial projects (e.g., Hadoop, Spark).
    • Each bug was labeled “priority changed” or “unchanged” based on the history of its priority field.
  2. Lifecycle Segmentation

    • Reporting Phase: From bug creation until the first status change (e.g., Open → In Progress).
    • Fixing Phase: From the first status change until the bug is resolved/closed.
  3. Feature Engineering

    • Static attributes: initial priority, severity, component, reporter reputation.
    • Dynamic evolution attributes: number of comments, time between comments, number of developers involved, lines of code changed, test coverage impact, etc.
  4. Handling Class Imbalance

    • Applied SMOTE (Synthetic Minority Over‑sampling Technique) to generate synthetic “priority‑change” instances.
    • Integrated cost‑sensitive classifiers that penalize misclassifying the minority class more heavily.
  5. Model Training & Evaluation

    • Tested several algorithms (Random Forest, XGBoost, Logistic Regression).
    • Used stratified 10‑fold cross‑validation, reporting F1‑score, F1‑weighted, and F1‑macro.
  6. Cross‑Project & Priority‑Level Experiments

    • Trained on one project, tested on another to gauge generalizability.
    • Analyzed performance per priority tier (e.g., P1‑Critical vs. P4‑Low).

Results & Findings

PhaseMetricScore
ReportingF1 (binary)0.798
FixingF1‑weighted0.712
FixingF1‑macro0.613
  • Bug‑fixing evolution features consistently outperformed baseline models that used only static attributes.
  • The imbalance handling strategy contributed an average lift of ~6 % in F1 across both phases.
  • Cross‑project transfer: While absolute scores dropped when applying a model to a different project, weighted F1 stayed above 0.60 for most pairs, indicating reasonable portability.
  • Priority‑level robustness: Prediction quality remained relatively stable across P1‑P4, suggesting the approach is not biased toward high‑severity bugs.

Practical Implications

  • Automated Triage Assistants: Integrate the model into JIRA or GitHub Issues to flag bugs likely to need a priority bump, prompting early review by project managers.
  • Resource Planning: Teams can anticipate spikes in high‑priority work, adjusting sprint capacity or allocating on‑call engineers proactively.
  • Reduced Human Bias: By providing data‑driven suggestions, the system mitigates subjective over‑ or under‑prioritization that often stems from “triage fatigue.”
  • Cross‑Project Knowledge Sharing: Open‑source foundations can seed new projects with pre‑trained models, accelerating effective bug management without extensive local data collection.

Limitations & Future Work

  • Dataset Scope: The study focuses on Apache projects; results may differ for commercial or smaller‑scale repositories with different workflow conventions.
  • Feature Freshness: Some evolution features (e.g., comment velocity) require real‑time updates, which could be costly to compute at scale.
  • Granular Priority Changes: The binary “changed vs. unchanged” label ignores the direction (e.g., upgrade vs. downgrade) and magnitude of the shift.
  • Future Directions:
    • Extend to multiclass prediction (predict exact new priority).
    • Explore deep‑learning sequence models (e.g., Transformers) to capture richer temporal patterns.
    • Conduct user studies to assess how developers interact with priority‑change recommendations in practice.

Authors

  • Guangzong Cai
  • Zengyang Li
  • Peng Liang
  • Ran Mo
  • Hui Liu
  • Yutao Ma

Paper Information

  • arXiv ID: 2512.09216v1
  • Categories: cs.SE
  • Published: December 10, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »