post correspondence problem

2 weeks ago · ai

Thinking Tokens Are Not Created Equal: Why Benchmarks Can't Distinguish Between 'Search' and 'Insight' (A PCP Experiment)

Experiment Overview I’ve been running experiments to understand how different “reasoning” models actually spend their thinking budget. The results suggest that...