reinforcement learning

1 day ago · ai

Why reinforcement learning plateaus without representation depth (and other key takeaways from NeurIPS 2025)

Every year, NeurIPS produces hundreds of impressive papers, and a handful that subtly reset how practitioners think about scaling, evaluation and system design....

#reinforcement learning #representation depth #NeurIPS 2025 #scaling laws #model evaluation #system design #machine learning research
2 days ago · ai

How Google’s 'internal RL' could unlock long-horizon AI agents

Researchers at Google have developed a technique that makes it easier for AI models to learn complex reasoning tasks that usually cause LLMs to hallucinate or f...

#reinforcement learning #internal RL #large language models #Google AI #reasoning #hallucination mitigation #AI research
5 days ago · ai

Customizing multiturn AI agents with reinforcement learning

Leveraging existing environment simulators and reward functions based on verifiable ground truth boosts task success rate, even with small models and small trai...

#reinforcement learning #multiturn agents #AI agents #environment simulators #reward functions #training data efficiency #Amazon Science
1 week ago · ai

The unseen work of building reliable AI agents

'Reinforcement learning gyms' train agents on the many low-level tasks that they must chain together to execute customer requests....

#reinforcement learning #AI agents #reliability #training pipelines #Amazon Science #RL gyms #machine learning
2 weeks ago · ai

Deep Reinforcement Learning: The Actor-Critic Method

Robot friends collaborate to learn to fly a drone The post Deep Reinforcement Learning: The Actor-Critic Method appeared first on Towards Data Science....

#deep reinforcement learning #actor-critic #reinforcement learning #machine learning #AI #robotics
2 weeks ago · ai

Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris

Article URL: https://kywch.github.io/blog/2025/12/curriculum-learning-2048-tetris/ Comments URL: https://news.ycombinator.com/item?id=46445195 Points: 6 Comment...

#curriculum learning #reinforcement learning #deep learning #game AI #2048 #Tetris #machine learning research
2 weeks ago · ai

Agents Under the Curve (AUC)

Towards understanding if your agentic solution is actually better The post Agents Under the Curve AUC appeared first on Towards Data Science....

#reinforcement learning #evaluation metrics #agents #AUC #machine learning
3 weeks ago · ai

Implementing Vibe Proving with Reinforcement Learning

How to make LLMs reason with verifiable, step-by-step logic Part 2 The post Implementing Vibe Proving with Reinforcement Learning appeared first on Towards Data...

#reinforcement learning #large language models #prompt engineering #reasoning
3 weeks ago · ai

Using the Reinforcement Learning GitHub Package

Introduction In machine learning, reinforcement learning RL is a paradigm where problem formulation matters as much as the algorithm itself. Unlike supervised...

#reinforcement learning #RL #R programming #MDPtoolbox #policy iteration #machine learning #GitHub package
3 weeks ago · ai

Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models

Read more about Language Agent Tree Search Unifies Reasoning, Acti...

#language-models #tree-search #MCTS #LLM-reasoning #planning #reinforcement-learning #AI-research #algorithm-design
0 month ago · ai

Continuously hardening ChatGPT Atlas against prompt injection

OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-...

#ChatGPT #Atlas #prompt injection #reinforcement learning #red teaming #AI safety #security
0 month ago · ai

How I built AI model that plays Whot! card game

markdown !Cover image for “How I built AI model that plays Whot! card game”https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,form...

#AI model #game AI #Whot card game #machine learning #reinforcement learning #Python #card game AI

Newer posts

Older posts