[Paper] Text Style Transfer with Parameter-efficient LLM Finetuning and Round-trip Translation

Published: (February 16, 2026 at 01:52 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.15013v1

Overview

The paper introduces a fresh take on Text Style Transfer (TST) by fine‑tuning large language models (LLMs) in a parameter‑efficient way and by generating synthetic parallel data through round‑trip translation. By turning monolingual text into a “neutral” version and then re‑styling it, the authors achieve higher quality style transfer than prompting‑only baselines—making the technique attractive for real‑world applications where style‑specific corpora are scarce.

Key Contributions

  • Parameter‑efficient LLM finetuning for TST, avoiding the cost of full model retraining.
  • Round‑trip translation pipeline that creates pseudo‑parallel “neutral ↔ styled” sentence pairs from monolingual data.
  • Unified neutral style used both during training and inference, simplifying the model’s learning objective.
  • Retrieval‑augmented generation (RAG) integration to preserve domain‑specific terminology and proper names during style conversion.
  • Comprehensive evaluation across four domains, showing consistent gains in BLEU and style‑accuracy over zero‑shot prompting and few‑shot in‑context learning (ICL).

Methodology

  1. Data Neutralization – Each sentence from a monolingual corpus is translated to a pivot language (e.g., English → German) and back again. The back‑translation tends to strip away stylistic markers while preserving content, yielding a “neutral” version of the original text.
  2. Synthetic Parallel Creation – The original (styled) sentence and its neutral counterpart form a pseudo‑parallel pair (neutral ↔ styled). Repeating this for many sentences builds a sizable training set without any human‑annotated style pairs.
  3. Parameter‑efficient Finetuning – Instead of updating all model weights, the authors employ adapters / LoRA‑style modules that add a small trainable matrix on top of a frozen LLM. This reduces GPU memory and training time dramatically.
  4. Training Objective – The model learns to map neutral inputs to the target style (e.g., formal → informal) using standard seq2seq loss. Because the neutral form is shared across all styles, the same finetuned backbone can be reused for multiple style targets.
  5. Retrieval‑augmented Generation (RAG) – At inference time, a lightweight index of domain‑specific terms (e.g., product names, technical jargon) is queried. Retrieved snippets are injected into the decoder context, helping the model keep critical terminology unchanged while still applying the desired style.

Results & Findings

DomainBLEU (baseline)BLEU (proposed)Style Accuracy ↑
Formal → Casual18.224.7+12 %
Academic → Blog15.622.1+10 %
Technical → Conversational17.023.5+13 %
Legal → Plain Language14.320.8+11 %
  • The parameter‑efficient finetuned model outperformed zero‑shot prompting and few‑shot ICL by 6–8 BLEU points on average.
  • Style‑accuracy (the proportion of outputs correctly classified into the target style) improved by 10–13 %, confirming that the model isn’t just copying content but truly altering style.
  • Adding RAG raised terminology preservation from ~78 % to >92 % across domains, demonstrating that the retrieval component effectively mitigates hallucination of names and domain‑specific words.

Practical Implications

  • Rapid deployment of style‑aware assistants – Companies can adapt a single LLM to multiple brand voices (formal, friendly, technical) without collecting costly parallel corpora.
  • Localization pipelines – By neutralizing source text, the same model can be reused for different target languages or dialects, cutting down on per‑language engineering effort.
  • Content moderation & rewriting tools – Platforms that need to rewrite user‑generated content (e.g., to remove profanity or enforce a corporate tone) can leverage the neutral‑style backbone for consistent, low‑latency transformations.
  • Preserving critical terminology – The RAG extension makes the approach safe for regulated industries (legal, medical, finance) where mis‑rendering a term can have serious consequences.
  • Cost‑effective scaling – Since only a small adapter is trained, the method fits on a single GPU, enabling smaller teams or startups to fine‑tune massive LLMs without massive infrastructure.

Limitations & Future Work

  • Quality of neutralization depends on the pivot language and translation system; poor back‑translation can introduce noise that harms style learning.
  • The approach assumes that style can be largely stripped away by translation, which may not hold for highly idiomatic or culturally bound styles.
  • Retrieval‑augmented generation currently relies on a static term index; dynamic or domain‑drift scenarios could require more sophisticated knowledge‑base updates.
  • Future research directions include exploring multilingual pivots, adaptive retrieval mechanisms, and extending the method to multimodal style transfer (e.g., code style or UI text).

Authors

  • Ruoxi Liu
  • Philipp Koehn

Paper Information

  • arXiv ID: 2602.15013v1
  • Categories: cs.CL
  • Published: February 16, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »