[Paper] Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG
Source: arXiv - 2601.23142v1
Overview
The paper investigates why most newcomers to open‑source projects never become core maintainers, and whether projects that aim for “social good” (OSS4SG) behave differently from traditional OSS. By analyzing almost a million contributors across 375 repositories, the authors uncover distinct temporal patterns that predict a faster, more likely transition from newcomer to core contributor.
Key Contributions
- Large‑scale comparative study of 190 OSS4SG and 185 conventional OSS projects (≈3.5 M commits, 92 k contributors).
- Retention advantage: OSS4SG projects keep newcomers 2.2× longer and give them a 19.6 % higher chance of becoming core members.
- Temporal contribution patterns identified:
- Early Spike – heavy activity right after the first commit.
- Late Spike – a period of low‑intensity exploration followed by a burst of activity.
- Predictive insight: Early broad exploration of the codebase (Late Spike) shortens the time‑to‑core by 2.4–2.9× (≈21 weeks vs. 51–60 weeks).
- Pathway diversity: Conventional OSS relies on a single dominant transition path (≈62 % of cases), whereas OSS4SG offers multiple viable routes.
- Actionable guidance for newcomers (choose value‑aligned projects, spend time learning before large contributions) and maintainers (design onboarding that encourages early exploration).
Methodology
- Dataset construction – The authors mined GitHub repositories, classifying projects as OSS4SG or conventional OSS using a curated list of mission‑statement keywords and manual verification.
- Contributor lifecycle extraction – For each contributor, the timeline from first commit to last commit (or promotion to core) was built, yielding over 3 M commit events.
- Temporal pattern detection – Using clustering on weekly commit frequency vectors, two dominant patterns emerged: Early Spike (high initial activity) and Late Spike (initial low activity, later surge).
- Statistical modeling – Survival analysis (Cox proportional hazards) measured time‑to‑core, while logistic regression estimated the probability of becoming core, controlling for project size, language, and activity level.
- Feature importance – Permutation importance quantified how much early exploration contributed to core‑transition likelihood (≈22 % of predictive power).
Results & Findings
| Metric | Conventional OSS | OSS4SG |
|---|---|---|
| Contributor retention (weeks) | 31 ± 12 | 68 ± 15 |
| Probability of reaching core | 0.31 | 0.37 |
| Dominant transition pathway | Early Spike (61.6 % of promotions) | Late Spike (45 %) + Early Spike (30 %) + Mixed (25 %) |
| Time‑to‑core (Late Spike) | 51–60 weeks | 21 weeks |
| Time‑to‑core (Early Spike) | 51–60 weeks | 51–60 weeks (no speed‑up) |
- Early broad exploration (contributing a few small patches across different modules before focusing) is the strongest predictor of a fast promotion.
- In conventional OSS, only the Late Spike pattern yields a speed advantage; in OSS4SG both patterns can lead to rapid core adoption, but Late Spike remains the fastest.
- Projects with a social‑good mission tend to have more welcoming cultures, clearer contribution guidelines, and higher “value alignment,” which together boost newcomer persistence.
Practical Implications
For Developers / Newcomers
- Pick projects that match your personal values – OSS4SG repositories often have more supportive onboarding and clearer impact narratives, which can keep you motivated.
- Spend the first few weeks exploring – Submit small, low‑risk changes across different parts of the codebase before committing to a large feature. This “Late Spike” strategy can halve the time needed to become a core maintainer.
- Track your own activity pattern – Tools like GitHub’s contribution graph can help you visualize whether you’re in an Early or Late Spike mode; aim for a gradual ramp‑up.
For Maintainers / Project Leads
- Design onboarding that encourages exploration – Provide a “starter‑issues” list spanning multiple modules, mentorship for code‑base tours, and low‑barrier PR templates.
- Highlight the project’s mission – Explicitly stating the social impact can improve retention, especially for contributors seeking purpose‑driven work.
- Monitor temporal patterns – Use analytics to spot contributors stuck in an Early Spike (high early activity but no follow‑up) and reach out with guidance or mentorship.
For Organizations & Tool Builders
- Integrate pattern detection into contribution dashboards – Offer alerts when a newcomer’s activity resembles an Early Spike, suggesting a “slow‑down & explore” nudge.
- Leverage mission‑tagging – Platforms could surface OSS4SG projects to developers looking for purpose‑aligned work, improving match quality and long‑term sustainability.
Limitations & Future Work
- Mission classification relies on keyword heuristics and manual checks; some OSS4SG projects may be missed or mis‑labelled.
- The study focuses on GitHub and may not generalize to other hosting platforms (GitLab, Bitbucket) or to private/open‑source hybrids.
- Temporal patterns were limited to weekly granularity; finer‑grained analysis (daily activity, issue comments) could reveal additional pathways.
- Future research could explore causal interventions (e.g., A/B testing onboarding flows) and examine how community governance models interact with the identified patterns.
Bottom line: If you want to move from “first‑timer” to “core maintainer,” pick a project whose mission resonates with you, spend a few weeks getting your hands dirty across the codebase, and then dive deep. For maintainers, fostering that exploratory phase and emphasizing purpose can dramatically improve newcomer retention and the health of the project.
Authors
- Mohamed Ouf
- Amr Mohamed
- Mariam Guizani
Paper Information
- arXiv ID: 2601.23142v1
- Categories: cs.SE
- Published: January 30, 2026
- PDF: Download PDF