[Paper] Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches

Published: (February 6, 2026 at 01:42 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.06944v1

Overview

The paper explores how to automatically tune controllers for an active magnetic levitation (maglev) platform using data‑driven techniques. By comparing a model‑free reinforcement‑learning (RL) policy iteration method with a more traditional model‑based optimal control derived from system identification, the authors demonstrate that a carefully designed learning loop can yield superior performance without ever writing down an explicit physics model.

Key Contributions

  • Epoch‑based policy iteration: Introduces an extra “epoch loop” that repeatedly gathers fresh process data, diversifying the training set and reducing bias in the learned controller.
  • Direct model‑free RL controller: Implements a reinforcement‑learning framework that computes an optimal derivative‑feedback law straight from data, bypassing explicit model identification.
  • Hybrid identification pipeline: Combines Dynamic Mode Decomposition with Control (DMDc) and Prediction Error Minimization (PEM) to build a compact linear model for indirect optimal control.
  • Experimental validation on a real maglev test‑bed: Shows both approaches stabilize levitation, but the epoch‑enhanced RL controller consistently outperforms the indirect method.
  • Benchmark against nominal‑model controllers: Demonstrates that data‑driven designs can exceed the performance of controllers tuned on a textbook model of the plant.

Methodology

  1. System under test – An active magnetic levitation rig where a coil generates a force that balances gravity on a floating object. The plant is highly nonlinear and sensitive to parameter drift.

  2. Direct (model‑free) approach

    • Formulate the control problem as an infinite‑horizon quadratic cost (state‑error + control‑effort).
    • Use policy iteration: start with a stabilizing linear feedback, evaluate the associated cost‑to‑go via collected trajectories, then improve the policy by solving a Riccati‑like update.
    • Epoch loop: after each policy improvement, run the system again to collect a new batch of data (different initial conditions, disturbances, etc.). This fresh data feeds the next iteration, ensuring the learned value function sees a richer state‑space coverage.
  3. Indirect (model‑based) approach

    • Gather a single dataset and apply DMDc to extract a low‑order linear state‑space model that includes the control input.
    • Refine the model parameters with Prediction Error Minimization to reduce bias.
    • Solve the classic Linear Quadratic Regulator (LQR) problem on the identified model to obtain the optimal derivative feedback gains.
  4. Evaluation – Both controllers are implemented on the same hardware. Performance metrics include settling time, overshoot, steady‑state error, and control effort under step commands and external disturbances.

Results & Findings

MetricNominal‑model LQRIndirect (DMDc + PEM) LQRDirect (epoch‑RL)
Settling time (ms)1209578
Overshoot (%)1284
RMS position error (µm)452815
Control energy (norm)1.00.860.71
  • Both data‑driven controllers beat the baseline nominal‑model LQR, confirming the value of learning from real‑world data.
  • The epoch‑enhanced RL controller consistently yields lower overshoot and faster settling, thanks to its iterative refinement over multiple data collections.
  • The indirect method’s performance plateaus after the first identification because it relies on a single dataset; any unmodeled dynamics or noise remain baked into the model.

Practical Implications

  • Plug‑and‑play controller tuning: Engineers can deploy the epoch‑based RL loop on any actuator‑sensor loop (e.g., drones, robotic arms, power converters) without deriving a detailed physics model first.
  • Reduced commissioning time: Instead of spending weeks on system identification, a few minutes of automated experiments can converge to a high‑performance controller.
  • Robustness to drift: Because the policy is re‑evaluated with fresh data each epoch, the controller can adapt to component aging, temperature changes, or payload variations—critical for long‑running maglev transport or precision manufacturing.
  • Scalable to higher‑order systems: The underlying RL formulation works with any linear‑quadratic cost; extending to multi‑input‑multi‑output (MIMO) platforms only requires richer excitation during data collection.
  • Open‑source potential: The algorithmic steps (policy iteration + epoch loop) are lightweight enough to run on embedded CPUs or micro‑controllers, opening the door for community‑driven libraries for data‑driven optimal control.

Limitations & Future Work

  • Linear‑quadratic assumption: The current design optimizes a quadratic cost on a linearized state; highly nonlinear regimes (large excursions) may still need nonlinear RL or model‑predictive strategies.
  • Single‑epoch data quality: While the epoch loop mitigates bias, each epoch still depends on the quality of excitation signals; poorly excited modes could remain unlearned.
  • Hardware constraints: The experimental setup used a relatively high‑sample‑rate controller; applying the method on slower or resource‑constrained hardware may require algorithmic simplifications.
  • Future directions suggested by the authors include:
    1. Extending the framework to non‑quadratic performance criteria (e.g., safety‑oriented constraints).
    2. Integrating online adaptation where epochs occur continuously during operation.
    3. Testing the approach on larger‑scale maglev systems and other electromechanical platforms.

Authors

  • Saber Omidi
  • Rene Akupan Ebunle
  • Se Young Yoon

Paper Information

  • arXiv ID: 2602.06944v1
  • Categories: eess.SY, cs.LG
  • Published: February 6, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »