[Paper] Social Proof is in the Pudding: The (Non)-Impact of Social Proof on Software Downloads
Source: arXiv - 2603.07919v1
Overview
The paper investigates whether “social proof” – the popularity cues developers see on platforms like GitHub (e.g., stars, download counts) – actually drives the adoption of open‑source packages. By running two large‑scale field experiments on real Python packages, the authors show that inflating these signals has virtually no effect on subsequent downloads or on other forms of developer engagement.
Key Contributions
- First large‑scale field experiment on social proof for software adoption – the authors bought stars for a random set of GitHub repositories and measured downstream effects.
- Second field experiment manipulating human‑download counts – they artificially increased the number of recorded downloads for a different set of packages.
- Comprehensive outcome metrics – beyond raw download numbers, the study tracks forks, pull requests, issues, stars, and other activity signals.
- Empirical evidence that social proof does not sway developer behavior – both experiments reveal no statistically significant impact on any measured metric.
- Implications for security and platform design – the findings suggest that “gaming” popularity metrics is unlikely to succeed in steering developers toward malicious code.
Methodology
- Dataset & Randomization – The researchers selected a pool of newly published Python packages on GitHub. Packages were randomly assigned to treatment (receive boosted social proof) or control groups.
- Treatment A: Bought Stars – For the first experiment, the authors purchased a set number of GitHub stars for each treatment repository using a commercial service that provides “real” accounts.
- Treatment B: Inflated Download Counts – In the second experiment, they scripted additional human‑like downloads (via distinct IPs and user‑agents) to increase the visible download tally for the treatment packages.
- Observation Window – After applying the treatments, the authors monitored each repository for several weeks, collecting data on:
- Daily download counts (PyPI statistics)
- GitHub activity: forks, pull requests, issues, new stars, watchers, and contributors
- Statistical Analysis – They employed difference‑in‑differences and regression models controlling for package age, initial popularity, and language‑specific trends to isolate the treatment effect.
Results & Findings
- No measurable lift in downloads – Packages that received extra stars or inflated download numbers did not experience a statistically significant increase in subsequent downloads compared to controls.
- Developer engagement unchanged – Forks, pull requests, issue creation, new stars, and other activity metrics remained indistinguishable between treated and untreated repositories.
- Effect size near zero – Confidence intervals for all outcome variables included zero, indicating that any potential impact is too small to be practically relevant.
- Robustness checks – Results held across different treatment intensities (e.g., 10 vs. 100 purchased stars) and across various time windows.
Practical Implications
- Security posture: Attackers cannot reliably “bootstrap” adoption of malicious packages by simply buying stars or fabricating download counts, reducing a feared attack vector.
- Platform design: GitHub and package registries may deprioritize heavy reliance on raw popularity metrics for recommendation engines, focusing instead on quality signals (e.g., test coverage, CI status).
- Developer decision‑making: Practitioners can be reassured that a package’s star count is not a strong predictor of future usage; deeper evaluation (documentation, code quality, community support) remains essential.
- Marketing strategies: Open‑source maintainers should not invest heavily in artificial popularity boosts; effort is better spent on improving documentation, issue response time, and real community engagement.
Limitations & Future Work
- Scope limited to Python packages – Results may differ for other ecosystems (e.g., JavaScript/npm, Rust/crates) where community norms vary.
- Short observation horizon – The study tracks effects over weeks; longer‑term adoption patterns (months or years) were not examined.
- Magnitude of manipulation – The treatments used realistic, modest boosts; extreme manipulations (e.g., thousands of stars) were not tested.
- User demographics – The experiments do not differentiate between novice vs. experienced developers, whose susceptibility to social proof could differ.
Future research could replicate the study across multiple language ecosystems, explore larger‑scale manipulations, and investigate whether other cues (e.g., badge displays, CI status) have stronger influence on software adoption decisions.
Authors
- Lucas Shen
- Gaurav Sood
Paper Information
- arXiv ID: 2603.07919v1
- Categories: cs.CY, cs.SE
- Published: March 9, 2026
- PDF: Download PDF