[Paper] Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Published: 2 months ago (February 20, 2026 at 12:48 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.18386v1

Overview

The paper introduces a reinforcement‑learning (RL) technique that automatically tunes the two most critical parameters of the Pure Pursuit (PP) controller—look‑ahead distance and steering gain—while an autonomous race car is on the track. By learning a policy with Proximal Policy Optimization (PPO), the authors eliminate the need for hand‑crafted, track‑specific schedules and achieve faster, smoother laps both in simulation and on a real‑world F1TENTH vehicle.

Key Contributions

Joint online tuning of look‑ahead distance (L_d) and steering gain (g) using a single RL policy.
Compact state representation (vehicle speed + curvature “taps”) that keeps inference lightweight for real‑time deployment.
End‑to‑end integration with a ROS 2 stack, enabling the learned policy to run on an actual race car without per‑track retuning.
Comprehensive evaluation showing the RL‑augmented PP outperforms:
- Fixed‑lookahead PP,
- Velocity‑scheduled adaptive PP,
- An RL variant that only adjusts look‑ahead, and
- A kinematic Model Predictive Control (MPC) raceline tracker in lap time, tracking error, and steering smoothness.
Open‑source implementation in the F1TENTH Gym environment, facilitating reproducibility and further research.

Methodology

Controller Backbone – Pure Pursuit
- PP computes a target point on the reference path at a distance (L_d) ahead of the vehicle.
- The steering command is
  [ \delta = g \cdot \arctan!\Big(\frac{2,L,\sin(\theta_e)}{L_d}\Big), ]
  where (L) is wheelbase and (\theta_e) is the heading error.
- Traditionally, (L_d) and (g) are set manually or via a simple speed‑based schedule.
Learning Problem Formulation
- State: a short history (taps) of vehicle speed and road curvature sampled along the upcoming path (e.g., 5‑10 points).
- Action: continuous values ((L_d, g)) produced by a neural network policy.
- Reward: combines lap‑time reduction, penalization of large lateral error, and a smoothness term for steering changes.
Training with PPO
- PPO, a stable on‑policy RL algorithm, updates the policy by clipping probability ratios to keep updates conservative.
- Training occurs entirely in the F1TENTH Gym simulator, which provides realistic vehicle dynamics and sensor noise.
- Curriculum learning (gradually increasing target speeds) helps the policy discover robust parameter schedules.
Deployment
- The trained network is exported as a TensorRT‑compatible model for low‑latency inference.
- A ROS 2 node reads the current speed and curvature taps, queries the policy, and feeds the resulting ((L_d, g)) back into the PP controller at 50 Hz.
- Light exponential smoothing is applied to the steering command to avoid high‑frequency jitter.

Results & Findings

Test Condition	Lap Time (s)	Mean Lateral Error (m)	Steering Jerk (rad/s³)
Fixed‑lookahead PP	12.84	0.28	4.9
Velocity‑scheduled PP	12.41	0.22	4.2
RL‑only‑lookahead	12.18	0.19	3.8
RL‑joint (L_d, g)	11.73	0.15	3.1
Kinematic MPC	12.05	0.17	3.4

The RL‑joint controller reduced lap time by ~9 % compared with the baseline fixed‑lookahead PP.
Lateral deviation dropped by ~46 %, indicating tighter adherence to the optimal racing line.
Steering smoothness improved, which translates to less wear on actuators and better passenger comfort (if applied to passenger‑grade vehicles).
Real‑car experiments on a 1:10 scale F1TENTH platform reproduced the simulation gains, confirming that the policy generalizes across the sim‑to‑real gap.

Practical Implications

Plug‑and‑play controller upgrades: Existing PP‑based stacks (common in low‑cost autonomous platforms) can be enhanced simply by adding the RL policy node—no redesign of the core controller is required.
Reduced engineering effort: Teams no longer need to hand‑tune look‑ahead schedules for each new track or speed profile, freeing resources for higher‑level tasks such as perception or strategy.
Scalable to full‑size racing: While demonstrated on a 1:10 scale car, the same approach can be transferred to larger platforms where PP is still used (e.g., autonomous delivery robots, off‑road vehicles).
Hybrid control paradigm: Shows that classical geometric controllers can be “smartened” with data‑driven parameter adaptation, offering a middle ground between pure model‑based and end‑to‑end learning methods.
Potential for safety‑critical domains: The smoothness penalty in the reward function ensures that the learned policy respects actuator limits, making it a candidate for applications where abrupt steering is undesirable (e.g., agricultural machinery, warehouse AGVs).

Limitations & Future Work

State abstraction: The policy relies on pre‑computed curvature taps; if the map is unavailable or the vehicle deviates far from the planned path, the input may become inaccurate.
Generalization to drastically different dynamics: The network was trained on a specific vehicle model; transferring to cars with different wheelbases, tire models, or higher speeds may require additional fine‑tuning or domain‑randomization.
Safety guarantees: While PPO yields stable policies, formal verification of the resulting controller’s stability under all operating conditions is not addressed.
Future directions suggested by the authors include:
- Extending the state to incorporate live perception (e.g., LiDAR‑derived curvature) for map‑less operation.
- Combining the RL‑tuned PP with higher‑level trajectory planners to handle overtaking or obstacle avoidance.
- Investigating meta‑learning techniques to enable rapid adaptation to new vehicle platforms with minimal additional data.

Authors

Mohamed Elgouhary
Amr S. El‑Wakeel

Paper Information

arXiv ID: 2602.18386v1
Categories: cs.RO, cs.AI, cs.LG, eess.SY
Published: February 20, 2026
PDF: Download PDF

[Paper] Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Unifying approach to uniform expressivity of graph neural networks

[Paper] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges