[Paper] CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Published: (March 18, 2026 at 11:21 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2603.18449v1

Overview

The widespread deployment of large language models (LLMs) calls for post-hoc methods that can flexibly adapt models to evolving safety requirements. Meanwhile, the rapidly expanding open-source LLM ecosystem has produced a diverse collection of models that already exhibit various safety-related functionalities. This motivates a shift from constructing safety functionality from scratch to reusing existing functionality from external models, thereby avoiding costly data collection and training procedures. In this paper, we present Cross-Model Neuron Transfer (CNT), a post-hoc method that reuses safety-oriented functionality by transferring a minimal subset of neurons from an open-source donor LLM to a target LLM. By operating at the neuron level, CNT enables modular function-level adaptation, supporting both function addition andfunction deletion. We evaluate CNT on seven popular LLMs across three representative applications: safety disalignment, alignment enhancement, and bias removal. Experimental results show that CNT achieves targeted safety-oriented functionality transfer with minimal performance degradation (less than 1% for most models), consistently outperforming five baselines, demonstrating its generality and practical effectiveness.

Key Contributions

This paper presents research in the following areas:

  • cs.CR
  • cs.SE

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CR.

Authors

  • Yue Zhao
  • Yujia Gong
  • Ruigang Liang
  • Shenchen Zhu
  • Kai Chen
  • Xuejing Yuan
  • Wangjun Zhang

Paper Information

  • arXiv ID: 2603.18449v1
  • Categories: cs.CR, cs.SE
  • Published: March 19, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »