使用后见之明对十年历史的 Hacker News 讨论进行自动评分

发布: 2个月前 (2025年12月11日 GMT+8 01:23)

5 分钟阅读

Source: Hacker News

hnhero

概览

昨天我偶然看到这个 HN 讨论串 — Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from now — 其中 Gemini 3 正在“幻觉”十年后的首页。一个评论链接到了恰好十年前的 HN 首页（https://news.ycombinator.com/front?day=2015-12-09）。在阅读这些讨论时，我意识到可以让 LLM 以远比手动更高效的方式为它们的预见性打分。于是我使用新发布的 Opus 4.5 搭建了一个流水线，抓取 2015 年 12 月的每篇 HN 首页文章，使用 ChatGPT 5.1 Thinking 进行事后分析，并将结果渲染为静态 HTML。

代码仓库在 GitHub: 。

为什么这个练习有趣

训练前瞻预测模型

预测未来可以被视为一种严肃且可训练的技能。只要有足够的历史数据和反馈回路，我们就能提升预测能力——这正是“超级预测者”所倡导的。

未来的 LLM 正在观察

我们今天的每一次行为，都可能被未来廉价运行的智能系统细致审视。当完美的重建与合成成为可能时，隐含的“安全靠模糊”假设会变得脆弱。现在就负责任地行动更为明智。

实现细节

数据收集

给定日期，下载首页（约 30 篇文章）。
对每篇文章，通过 Algolia API 抓取文章正文和完整评论线程。
将所有内容打包成 markdown 提示词，供后续分析使用。

用于 ChatGPT 5.1 Thinking 的提示词

The following is an article that appeared on Hacker News 10 years ago, and the discussion thread.

Let's use our benefit of hindsight now in 6 sections:

1. Give a brief summary of the article and the discussion thread.
2. What ended up happening to this topic? (research the topic briefly and write a summary)
3. Give out awards for "Most prescient" and "Most wrong" comments, considering what happened.
4. Mention any other fun or notable aspects of the article or discussion.
5. Give out grades to specific people for their comments, considering what happened.
6. At the end, give a final score (from 0-10) for how interesting this article and its retrospect analysis was.

As for the format of Section 5, use the header "Final grades" and follow it with an unordered list in the format:
- name: grade (optional comment)

Example:
Final grades
- speckx: A+ (excellent predictions on …)
- tosh: A (correctly predicted this or that …)
- keepamovin: A
- bgwalter: D
- fsflover: F (completely wrong on …)

For Section 6, use the prefix:
Article hindsight analysis interestingness score:

处理流程

通过 OpenAI API 将提示词提交给 GPT 5.1 Thinking。
解析返回的 markdown。
将结果渲染为静态 HTML 页面，便于浏览。

渲染与托管

生成的页面托管在。
所有中间数据文件以 data.zip 形式提供，位于同一 URL 前缀下（特意避免直接链接）。

示例

2015 年 12 月的几篇精选页面：

2015 年 12 月 3 日 – Swift 开源
2015 年 12 月 6 日 – Figma 发布
2015 年 12 月 11 日 – OpenAI 最初公告
2015 年 12 月 16 日 – geohot 正在构建 Comma
2015 年 12 月 22 日 – SpaceX 发射网络直播：Orbcomm‑2 任务
2015 年 12 月 28 日 – Theranos 困境

名人堂

名人堂 页面 () 按 IMDb 风格的平均分对 2015 年 12 月的顶级评论者进行排名。值得注意的名字包括：

pcwalton
tptacek
paulmd
cstross
greglindahl
moxie
hannob
0xcde4c3db
Manishearth
johncolanduoni

GPT 5.1 Thinking 将他们的许多评论标记为尤为有洞察力且具前瞻性。

成本与性能

处理 31 天 × 30 篇文章 = 930 次 LLM 查询，花费约 $58，耗时约 1 小时。未来的模型预计会让这种事后分析更便宜、更快。

希望大家喜欢探索这些结果。代码（以及 Opus 4.5 脚本）已开源，任何人都可以复现或扩展此分析。

使用后见之明对十年历史的 Hacker News 讨论进行自动评分

概览

为什么这个练习有趣

训练前瞻预测模型

未来的 LLM 正在观察

实现细节

数据收集

用于 ChatGPT 5.1 Thinking 的提示词

处理流程

渲染与托管

示例

名人堂

成本与性能

相关文章

斯坦福仅用8个词就终结了 Prompt Engineering

使用 ADK 和全新 Interactions API 构建代理

针对 Gemini 3 的新 Gemini API 更新

使用 ADK 与全新 Interactions API 构建代理

概览

为什么这个练习有趣

训练前瞻预测模型

未来的 LLM 正在观察

实现细节

数据收集

用于 ChatGPT 5.1 Thinking 的提示词

处理流程

渲染与托管

示例

名人堂

成本与性能

相关文章

斯坦福仅用8个词就终结了 Prompt Engineering

使用 ADK 和全新 Interactions API 构建代理

针对 Gemini 3 的新 Gemini API 更新

使用 ADK 与全新 Interactions API 构建代理

用于 ChatGPT 5.1 Thinking 的提示词