reinforcement learning — Page 2

0个月前 · ai

为什么 AI 安全应从结构上强制，而不是通过训练

大多数当前的 AI 安全工作假设系统不安全，并尝试对其进行更好的行为训练。- 我们添加更多数据。- 我们添加更多约束。- 我们添加更多 fi...

#AI safety #alignment #reinforcement learning #structural enforcement #machine learning #AI governance #reward hacking
1个月前 · ai

LLM 年度回顾

2025年12月19日 !unnamed https://bear-images.sfo2.cdn.digitaloceanspaces.com/karpathy/unnamed.webp 2025年是LLMs取得强劲且多事进展的一年。

#LLM #RLVR #reinforcement learning #AI progress 2025 #language models
1个月前 · ai

OpenAI Gym

概述 OpenAI Gym 是一个用于通过试错教计算机的简单实验平台。你把任务放进去，程序尝试动作，从错误中学习，……

#openai #gym #reinforcement-learning #rl #machine-learning #ai-toolkit #benchmarks #research
1个月前 · ai

AI 代理在复杂任务上失败率为 63%。Patronus AI 表示其全新的“活体”训练世界可以解决这一问题。

Patronus AI，这家获得包括 Lightspeed Venture Partners 和 Datadog 在内的投资者提供的 2000 万美元融资的人工智能评估初创公司，推出了一个…

#AI agents #reinforcement learning #training environments #synthetic worlds #Patronus AI #complex task performance #AI evaluation
1个月前 · software

进化算法，实时渲染于 Node.js

Reinforcement Learning、Evolutionary Algorithms 和 Visual Computing Reinforcement Learning、Evolutionary Algorithms，以及任何让计算机看见的技术都是……

#evolutionary-algorithms #nodejs #graphics-rendering #tessera.js #real-time-visualization #algorithm-demo #reinforcement-learning
1个月前 · ai

层次化多智能体中的时间上下文注意力

Temporal Contextual Attention in Hierarchical Multi-Agent Systems with Non-Stationary Reward Functions 挑战概述考虑一个包含 N 层级 …

#multi-agent systems #reinforcement learning #non-stationary rewards #temporal contextual attention #hierarchical agents #knowledge graph
1个月前 · ai

强化学习环境：AI 代理如何通过经验学习

人工智能代理通过交互和反馈进行改进，这一过程称为强化学习（Reinforcement Learning，RL）。在这种学习范式中，代理…

#reinforcement learning #RL environments #AI agents #machine learning #generative AI #simulation #training
1个月前 · ai

Ai2 的新 Olmo 3.1 扩展强化学习训练，以实现更强的推理基准

Allen Institute for AI（Ai2）最近发布了他们称之为迄今为止最强大的模型系列——Olmo 3。但公司仍在不断迭代这些模型，……

#Olmo 3.1 #reinforcement learning #reasoning benchmarks #Allen Institute for AI #large language models #model efficiency
1个月前 · ai

战术探戈：对强化的深入比较

强化学习：务实的先锋强化学习（RL）已在游戏、机器人和体育领域取得成功。其核心理念是提供一个……

#reinforcement learning #evolution strategies #AI sports coaches #machine learning comparison #RL vs ES #sports AI
1个月前 · ai

[Paper] 逃离验证器：通过示例学习推理

训练大型语言模型（LLMs）进行推理通常依赖于带有任务特定验证器的强化学习（RL）。然而，许多现实世界的推理‑

#LLM #reinforcement learning #reasoning #research paper
1个月前 · ai

[Paper] 使用迭代 PPO 对齐 LLM 以实现多轮对话结果

优化大型语言模型（LLMs）以实现多轮对话结果仍然是一个重大挑战，尤其是在像 AI mar... 这样的目标导向设置中。

#LLM #reinforcement learning #PPO #RLHF #goal-oriented dialogue
1个月前 · ai

[论文] BAMAS：结构化预算感知多智能体系统

Large language model (LLM)-based multi-agent systems 已经成为一种强大的范式，使 autonomous agents 能够解决复杂任务。随着这些系统…

#budget-aware AI #multi-agent systems #LLM cost optimization #integer linear programming #reinforcement learning

Newer posts

Older posts