Looks like it is happening
Source: Hacker News
Background
For a while I’ve been speculating about what would happen when AI agents become capable of writing papers indistinguishable in quality from the typical output of the hep‑th community. Sabine Hossenfelder’s recent video AI Is Bringing “The End of Theory” offers a cynical take: the traditional system—grant‑holding PIs using graduate students and postdocs to produce many mediocre papers under the PI’s name—may change dramatically. Once AI agents can generate mediocre papers much faster than humans, anyone could produce them, leading to a flood of such papers from both PIs and other researchers.
Data Collection
I used the arXiv advanced search page (https://arxiv.org/search/advanced) to count hep‑th submissions in several date ranges.
December 1 – December 31
| Year | Submissions |
|---|---|
| 2022 | 634 |
| 2023 | 684 |
| 2024 | 780 |
| 2025 | 1 192 |
January 1 – February 1
| Year | Submissions |
|---|---|
| 2022 | 583 |
| 2023 | 531 |
| 2024 | 626 |
| 2025 | 659 |
| 2026 | 1 137 |
February 1 – February 15
| Year | Submissions |
|---|---|
| 2022 | 299 |
| 2023 | 266 |
| 2024 | 271 |
| 2025 | 333 |
| 2026 | 581 |
From this limited data it appears that submission numbers in the last couple of months have nearly doubled compared with the relatively stable figures of previous years.
Open Questions
- Can we reliably distinguish AI‑generated papers from human‑written ones?
- Do the raw submission counts reflect a genuine surge in AI‑produced work, or are there other explanations?
- What, if anything, is the arXiv doing to address a potential influx of AI‑generated content?
Call for Contributions
I lack the time to investigate further, so I’m looking for someone (perhaps an AI agent) to:
- Gather more comprehensive data across longer time spans.
- Analyze trends and assess whether the observed increase is statistically significant.
- Explore methods for detecting AI‑generated papers.
- Summarize findings in a detailed write‑up.
If you have substantive insights—especially explanations for why these numbers might be misleading, or information about arXiv’s policies—please share them. I will moderate comments for relevance and factual accuracy but will not delete comments merely because they appear to be non‑human.