reinforcement learning — Page 2

0 month ago · ai

Why AI safety should be enforced structurally, not trained in

Most current AI safety work assumes an unsafe system and tries to train better behavior into it. - We add more data. - We add more constraints. - We add more fi...

#AI safety #alignment #reinforcement learning #structural enforcement #machine learning #AI governance #reward hacking
1 month ago · ai

LLM Year in Review

19 Dec, 2025 !unnamedhttps://bear-images.sfo2.cdn.digitaloceanspaces.com/karpathy/unnamed.webp 2025 has been a strong and eventful year of progress in LLMs. The...

#LLM #RLVR #reinforcement learning #AI progress 2025 #language models
1 month ago · ai

OpenAI Gym

Overview OpenAI Gym is a simple playground for teaching computers through trial and error. You drop a task in, the program tries actions, learns from mistakes,...

#openai #gym #reinforcement-learning #rl #machine-learning #ai-toolkit #benchmarks #research
1 month ago · ai

AI agents fail 63% of the time on complex tasks. Patronus AI says its new 'living' training worlds can fix that.

Patronus AI, the artificial intelligence evaluation startup backed by $20 million from investors including Lightspeed Venture Partners and Datadog, unveiled a n...

#AI agents #reinforcement learning #training environments #synthetic worlds #Patronus AI #complex task performance #AI evaluation
1 month ago · software

Evolutionary Algorithms, Rendered Live in Node.js

Reinforcement Learning, Evolutionary Algorithms, and Visual Computing Reinforcement learning, evolutionary algorithms, and anything that lets computers see are...

#evolutionary-algorithms #nodejs #graphics-rendering #tessera.js #real-time-visualization #algorithm-demo #reinforcement-learning
1 month ago · ai

**Temporal Contextual Attention in Hierarchical Multi-Agent

Temporal Contextual Attention in Hierarchical Multi-Agent Systems with Non-Stationary Reward Functions Challenge Overview Consider a scenario with N hierarchic...

#multi-agent systems #reinforcement learning #non-stationary rewards #temporal contextual attention #hierarchical agents #knowledge graph
1 month ago · ai

Reinforcement Learning Environments: How AI Agents Learn Through Experience

Artificial intelligence agents improve through interaction and feedback, a process known as reinforcement learning RL. In this learning paradigm, an agent opera...

#reinforcement learning #RL environments #AI agents #machine learning #generative AI #simulation #training
1 month ago · ai

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

The Allen Institute for AI Ai2 recently released what it calls its most powerful family of models yet, Olmo 3. But the company kept iterating on the models, exp...

#Olmo 3.1 #reinforcement learning #reasoning benchmarks #Allen Institute for AI #large language models #model efficiency
1 month ago · ai

**The Tactical Tango: An In-Depth Comparison of Reinforcemen

Reinforcement Learning: The Pragmatic Pioneer Reinforcement Learning RL has achieved success in game playing, robotics, and sports. The core idea is to give an...

#reinforcement learning #evolution strategies #AI sports coaches #machine learning comparison #RL vs ES #sports AI
1 month ago · ai

[Paper] Escaping the Verifier: Learning to Reason via Demonstrations

Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-int...

#LLM #reinforcement learning #reasoning #research paper
1 month ago · ai

[Paper] Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

Optimizing large language models (LLMs) for multi-turn conversational outcomes remains a significant challenge, especially in goal-oriented settings like AI mar...

#LLM #reinforcement learning #PPO #RLHF #goal-oriented dialogue
1 month ago · ai

[Paper] BAMAS: Structuring Budget-Aware Multi-Agent Systems

Large language model (LLM)-based multi-agent systems have emerged as a powerful paradigm for enabling autonomous agents to solve complex tasks. As these systems...

#budget-aware AI #multi-agent systems #LLM cost optimization #integer linear programming #reinforcement learning

Newer posts

Older posts