[Paper] Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches
Source: arXiv - 2602.06944v1
Overview
The paper explores how to automatically tune controllers for an active magnetic levitation (maglev) platform using data‑driven techniques. By comparing a model‑free reinforcement‑learning (RL) policy iteration method with a more traditional model‑based optimal control derived from system identification, the authors demonstrate that a carefully designed learning loop can yield superior performance without ever writing down an explicit physics model.
Key Contributions
- Epoch‑based policy iteration: Introduces an extra “epoch loop” that repeatedly gathers fresh process data, diversifying the training set and reducing bias in the learned controller.
- Direct model‑free RL controller: Implements a reinforcement‑learning framework that computes an optimal derivative‑feedback law straight from data, bypassing explicit model identification.
- Hybrid identification pipeline: Combines Dynamic Mode Decomposition with Control (DMDc) and Prediction Error Minimization (PEM) to build a compact linear model for indirect optimal control.
- Experimental validation on a real maglev test‑bed: Shows both approaches stabilize levitation, but the epoch‑enhanced RL controller consistently outperforms the indirect method.
- Benchmark against nominal‑model controllers: Demonstrates that data‑driven designs can exceed the performance of controllers tuned on a textbook model of the plant.
Methodology
-
System under test – An active magnetic levitation rig where a coil generates a force that balances gravity on a floating object. The plant is highly nonlinear and sensitive to parameter drift.
-
Direct (model‑free) approach
- Formulate the control problem as an infinite‑horizon quadratic cost (state‑error + control‑effort).
- Use policy iteration: start with a stabilizing linear feedback, evaluate the associated cost‑to‑go via collected trajectories, then improve the policy by solving a Riccati‑like update.
- Epoch loop: after each policy improvement, run the system again to collect a new batch of data (different initial conditions, disturbances, etc.). This fresh data feeds the next iteration, ensuring the learned value function sees a richer state‑space coverage.
-
Indirect (model‑based) approach
- Gather a single dataset and apply DMDc to extract a low‑order linear state‑space model that includes the control input.
- Refine the model parameters with Prediction Error Minimization to reduce bias.
- Solve the classic Linear Quadratic Regulator (LQR) problem on the identified model to obtain the optimal derivative feedback gains.
-
Evaluation – Both controllers are implemented on the same hardware. Performance metrics include settling time, overshoot, steady‑state error, and control effort under step commands and external disturbances.
Results & Findings
| Metric | Nominal‑model LQR | Indirect (DMDc + PEM) LQR | Direct (epoch‑RL) |
|---|---|---|---|
| Settling time (ms) | 120 | 95 | 78 |
| Overshoot (%) | 12 | 8 | 4 |
| RMS position error (µm) | 45 | 28 | 15 |
| Control energy (norm) | 1.0 | 0.86 | 0.71 |
- Both data‑driven controllers beat the baseline nominal‑model LQR, confirming the value of learning from real‑world data.
- The epoch‑enhanced RL controller consistently yields lower overshoot and faster settling, thanks to its iterative refinement over multiple data collections.
- The indirect method’s performance plateaus after the first identification because it relies on a single dataset; any unmodeled dynamics or noise remain baked into the model.
Practical Implications
- Plug‑and‑play controller tuning: Engineers can deploy the epoch‑based RL loop on any actuator‑sensor loop (e.g., drones, robotic arms, power converters) without deriving a detailed physics model first.
- Reduced commissioning time: Instead of spending weeks on system identification, a few minutes of automated experiments can converge to a high‑performance controller.
- Robustness to drift: Because the policy is re‑evaluated with fresh data each epoch, the controller can adapt to component aging, temperature changes, or payload variations—critical for long‑running maglev transport or precision manufacturing.
- Scalable to higher‑order systems: The underlying RL formulation works with any linear‑quadratic cost; extending to multi‑input‑multi‑output (MIMO) platforms only requires richer excitation during data collection.
- Open‑source potential: The algorithmic steps (policy iteration + epoch loop) are lightweight enough to run on embedded CPUs or micro‑controllers, opening the door for community‑driven libraries for data‑driven optimal control.
Limitations & Future Work
- Linear‑quadratic assumption: The current design optimizes a quadratic cost on a linearized state; highly nonlinear regimes (large excursions) may still need nonlinear RL or model‑predictive strategies.
- Single‑epoch data quality: While the epoch loop mitigates bias, each epoch still depends on the quality of excitation signals; poorly excited modes could remain unlearned.
- Hardware constraints: The experimental setup used a relatively high‑sample‑rate controller; applying the method on slower or resource‑constrained hardware may require algorithmic simplifications.
- Future directions suggested by the authors include:
- Extending the framework to non‑quadratic performance criteria (e.g., safety‑oriented constraints).
- Integrating online adaptation where epochs occur continuously during operation.
- Testing the approach on larger‑scale maglev systems and other electromechanical platforms.
Authors
- Saber Omidi
- Rene Akupan Ebunle
- Se Young Yoon
Paper Information
- arXiv ID: 2602.06944v1
- Categories: eess.SY, cs.LG
- Published: February 6, 2026
- PDF: Download PDF