reinforcement learning

2天前 · ai

为什么强化学习在缺乏表征深度时会出现平台期（以及NeurIPS 2025的其他关键要点）

每年，NeurIPS 产生数百篇令人印象深刻的论文，其中少数几篇微妙地重新定义了从业者对规模化、评估和系统设计的思考方式……

#reinforcement learning #representation depth #NeurIPS 2025 #scaling laws #model evaluation #system design #machine learning research
2天前 · ai

Google的“internal RL”如何解锁长时程 AI 代理

谷歌的研究人员开发了一种技术，使 AI 模型更容易学习通常会导致 LLMs 产生幻觉或出现错误的复杂推理任务。

#reinforcement learning #internal RL #large language models #Google AI #reasoning #hallucination mitigation #AI research
5天前 · ai

使用强化学习定制多轮 AI 代理

利用现有的环境模拟器和基于可验证真实数据的奖励函数，即使在小模型和小规模训练的情况下，也能提升任务成功率。

#reinforcement learning #multiturn agents #AI agents #environment simulators #reward functions #training data efficiency #Amazon Science
1周前 · ai

构建可靠 AI 代理的幕后工作

“Reinforcement learning gyms” 训练 agents 在许多低层任务上，这些任务必须串联起来以执行客户请求……

#reinforcement learning #AI agents #reliability #training pipelines #Amazon Science #RL gyms #machine learning
2周前 · ai

深度强化学习：Actor-Critic 方法

机器人朋友们合作学习如何驾驶无人机。该文章《Deep Reinforcement Learning: The Actor-Critic Method》首次发表于 Towards Data Science....

#deep reinforcement learning #actor-critic #reinforcement learning #machine learning #AI #robotics
2周前 · ai

从脚手架到超人：Curriculum Learning 如何解决 2048 与 Tetris

请提供您希望翻译的具体摘录或摘要文本，我才能为您进行翻译。

#curriculum learning #reinforcement learning #deep learning #game AI #2048 #Tetris #machine learning research
2周前 · ai

曲线下的代理 (AUC)

为了了解你的 agentic solution 是否真的更好，文章《Agents Under the Curve AUC》首次发表于 Towards Data Science....

#reinforcement learning #evaluation metrics #agents #AUC #machine learning
3周前 · ai

使用强化学习实现 Vibe Proving

如何让 LLMs 进行可验证的逐步推理（第 2 部分）文章《Implementing Vibe Proving with Reinforcement Learning》首次发表于 Towards Data…

#reinforcement learning #large language models #prompt engineering #reasoning
3周前 · ai

使用强化学习 GitHub 包

引言在机器学习中，强化学习（RL）是一种范式，问题的表述与算法本身同等重要。不同于监督学习…

#reinforcement learning #RL #R programming #MDPtoolbox #policy iteration #machine learning #GitHub package
3周前 · ai

语言代理树搜索统一语言模型中的推理、行动和规划

了解更多关于 Language Agent Tree Search 统一推理、行动的内容。

#language-models #tree-search #MCTS #LLM-reasoning #planning #reinforcement-learning #AI-research #algorithm-design
0个月前 · ai

持续强化 ChatGPT Atlas 对抗提示注入

OpenAI 正在通过使用强化学习训练的自动化红队来加强 ChatGPT Atlas 对提示注入攻击的防御。这种主动的发现—

#ChatGPT #Atlas #prompt injection #reinforcement learning #red teaming #AI safety #security
0个月前 · ai

我如何构建玩 Whot! 纸牌游戏的 AI 模型

封面图片：“How I built AI model that plays Whot! card game”

#AI model #game AI #Whot card game #machine learning #reinforcement learning #Python #card game AI

Newer posts

Older posts