reasoning

2天前 · ai

Google的“internal RL”如何解锁长时程 AI 代理

谷歌的研究人员开发了一种技术，使 AI 模型更容易学习通常会导致 LLMs 产生幻觉或出现错误的复杂推理任务。

#reinforcement learning #internal RL #large language models #Google AI #reasoning #hallucination mitigation #AI research
1周前 · ai

2M Token 陷阱：为何“Context Stuffing”会削弱推理

抱歉，我无法直接访问或查看该链接中的内容。请您把需要翻译的文字直接粘贴在这里，我会帮您翻译成简体中文。

#LLM #context window #token limit #prompt engineering #reasoning #AI performance
2周前 · ai

AI 代理：掌握3个关键模式（ReAct）。第2部分，共3部分

文章第一部分这些模式的代码已在 GitHub 上提供。仓库 “Tool‑Using” Pattern Article 1 我们给了 AI 手来与外部世界交互....

#ReAct #AI agents #LLM #tool use #reasoning #prompt engineering
3周前 · ai

使用强化学习实现 Vibe Proving

如何让 LLMs 进行可验证的逐步推理（第 2 部分）文章《Implementing Vibe Proving with Reinforcement Learning》首次发表于 Towards Data…

#reinforcement learning #large language models #prompt engineering #reasoning
0个月前 · ai

理解 Vibe Proving

如何让 LLMs 进行可验证的逐步逻辑推理第 1 部分文章《Understanding Vibe Proving》首次发表于 Towards Data Science....

#LLM #reasoning #verifiable logic #step-by-step reasoning #AI safety
1个月前 · ai

我的 Google AI Agents 密集体验——每日反思

🗓️ 第一天 – Agentic AI 介绍第一天重新塑造了我对 AI 的认知。我了解到，agent 不仅仅是一个 model——它是一个能够感知、…

#Google AI #AI agents #agentic AI #LLM #autonomous systems #reasoning #planning #memory #tool use #AI intensive course
1个月前 · ai

思考 Token 并非等价：为什么基准测试无法区分“搜索”和“洞察”（A PCP 实验）

实验概述我一直在进行实验，以了解不同的“reasoning”模型实际上是如何使用它们的思考预算的。结果表明……

#LLM #reasoning #token budgeting #benchmarks #post correspondence problem #model evaluation
1个月前 · ai

🚀 Gemini 3 正在改变 AI 版图——OpenAI 已感受到它

2025 正在成为 Gemini 3 的一年。Google 最新的旗舰模型不仅追上了 OpenAI——许多开发者认为它已经超越了 GPT‑4……

#Gemini 3 #Google AI #OpenAI #large language model #multimodal AI #reasoning #LLM competition
1个月前 · ai

[Paper] 逃离验证器：通过示例学习推理

训练大型语言模型（LLMs）进行推理通常依赖于带有任务特定验证器的强化学习（RL）。然而，许多现实世界的推理‑

#LLM #reinforcement learning #reasoning #research paper