[Paper] PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Published: (June 8, 2026 at 12:19 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.09697v1

Overview

Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still failing to support the needs of the person behind the request. We present PsychoSafe, a psychologically-informed refusal framework that reframes refusal as structured supportive communication grounded in evidence-based intervention strategies. To develop PsychoSafe, we construct a corpus of 8019 prompt-response pairs spanning five psychologically salient risk domains and apply prompting and parameter-efficient fine-tuning to Qwen 3.5 27B. On a balanced validation set of 500 prompts, evaluated with an LLM judge and validated through human ratings, PsychoSafe prompting improves overall refusal quality by 28.1% over a generic baseline, with particularly strong gains in external resource referral (+46.8%) and psychological grounding (+34.8%), while preserving downstream performance on non-refusal tasks. Fine-tuning achieves near-perfect refusal and resource-referral rates but reduces response relevance. Additional evaluations on SORRY-Bench and XSTest show strong in-domain robustness but limited out-of-domain generalization, suggesting that future work should diversify fine-tuning data to help models apply interventions selectively rather than schematically.

Key Contributions

This paper presents research in the following areas:

  • cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

  • Gianluca Barmina
  • Federico Torrielli
  • Sven Harms
  • Jacob Nielsen
  • Felix Mächtle
  • Stine Lyngsø Beltoft
  • Peter Schneider-Kamp
  • Thomas Eisenbarth
  • Lukas Galke Poech
  • Anne Lauscher

Paper Information

  • arXiv ID: 2606.09697v1
  • Categories: cs.CL
  • Published: June 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »