[Paper] Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents

Published: (December 9, 2025 at 01:04 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.08870v1

Overview

The paper introduces Fed‑SE, a novel federated learning framework that lets large‑language‑model (LLM) agents continue to evolve their skills across many privacy‑restricted environments without ever sharing raw data. By combining smart local fine‑tuning with a low‑rank global aggregation step, Fed‑SE overcomes the instability that typically plagues federated training of open‑ended agents.

Key Contributions

  • Federated Self‑Evolution paradigm: a local‑evolution / global‑aggregation loop tailored for LLM agents that must learn from sparse, trajectory‑level feedback.
  • Gradient‑stable local updates: uses parameter‑efficient fine‑tuning (e.g., LoRA) on a curated set of high‑return trajectories, dramatically reducing gradient conflicts.
  • Low‑rank subspace aggregation: projects client updates onto a shared low‑dimensional subspace, isolating environment‑specific dynamics and mitigating negative transfer.
  • Empirical validation: experiments on five heterogeneous benchmark environments show an ~18 % boost in average task success compared with standard federated baselines.
  • Privacy‑first design: no raw interaction logs leave the client device, satisfying strict data‑privacy regulations common in enterprise and edge deployments.

Methodology

  1. Local Evolution

    • Each client runs its LLM agent in its own environment (e.g., a specific workflow automation or a game level).
    • The agent collects interaction trajectories and computes a scalar return (success/failure, reward).
    • Only the top‑k high‑return trajectories are kept; the rest are discarded to avoid noisy gradients.
    • The agent is fine‑tuned on this filtered set using a parameter‑efficient adapter (LoRA, prefix‑tuning, etc.), so only a tiny subset of weights is updated.
  2. Global Aggregation

    • Clients encrypt and send their adapter updates (not the full model) to a central server.
    • The server performs low‑rank matrix factorization on the stacked updates, extracting a shared subspace that captures common knowledge while filtering out environment‑specific noise.
    • The aggregated subspace is broadcast back; each client projects the global update onto its local adapter, completing the evolution cycle.
  3. Iterative Loop

    • The process repeats for multiple communication rounds, gradually improving the agents while keeping data on‑device.

Results & Findings

MetricFed‑SEFedAvg (baseline)FedProx (baseline)
Avg. task success ↑78 %60 %62 %
Communication overhead (MB/round)1.21.21.2
Convergence rounds (to 70 % success)122220
  • Stability: Gradient variance across clients dropped by ~45 % thanks to trajectory filtering and low‑rank aggregation.
  • Negative transfer reduction: Environments with contradictory objectives (e.g., “minimize steps” vs. “explore thoroughly”) no longer dragged each other down.
  • Scalability: Adding two more heterogeneous clients only increased the communication payload linearly, confirming the method’s suitability for large federations.

Practical Implications

  • Enterprise AI assistants can continuously improve across different departments (HR, finance, support) without exposing confidential logs.
  • Edge‑deployed LLM bots (e.g., in IoT devices, autonomous drones) can share learning signals while respecting on‑device privacy constraints.
  • Rapid prototyping: Teams can spin up new environment‑specific agents, let them self‑evolve locally, and then merge improvements globally in a few communication rounds.
  • Reduced infrastructure cost: Because only low‑dimensional adapters are transmitted, bandwidth and storage requirements stay minimal, making Fed‑SE viable for mobile or satellite links.

Limitations & Future Work

  • Heterogeneity ceiling: When client environments are extremely divergent (e.g., language translation vs. code generation), the low‑rank subspace may still capture conflicting signals, limiting gains.
  • Reward sparsity: The approach relies on enough high‑return trajectories; in tasks with extremely sparse rewards, additional exploration strategies may be needed.
  • Security considerations: While raw data never leaves the client, model updates could still leak information; integrating differential privacy or secure aggregation is a natural next step.
  • Broader benchmarks: The authors plan to test Fed‑SE on larger LLMs (e.g., 70B parameters) and on real‑world corporate datasets to assess scalability and robustness further.

Authors

  • Xiang Chen
  • Yuling Shi
  • Qizhen Lan
  • Yuchao Qiu
  • Xiaodong Gu

Paper Information

  • arXiv ID: 2512.08870v1
  • Categories: cs.LG, cs.AI
  • Published: December 9, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »