[Paper] PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Published: 3 days ago (June 8, 2026 at 12:19 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.09697v1

Overview

Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still failing to support the needs of the person behind the request. We present PsychoSafe, a psychologically-informed refusal framework that reframes refusal as structured supportive communication grounded in evidence-based intervention strategies. To develop PsychoSafe, we construct a corpus of 8019 prompt-response pairs spanning five psychologically salient risk domains and apply prompting and parameter-efficient fine-tuning to Qwen 3.5 27B. On a balanced validation set of 500 prompts, evaluated with an LLM judge and validated through human ratings, PsychoSafe prompting improves overall refusal quality by 28.1% over a generic baseline, with particularly strong gains in external resource referral (+46.8%) and psychological grounding (+34.8%), while preserving downstream performance on non-refusal tasks. Fine-tuning achieves near-perfect refusal and resource-referral rates but reduces response relevance. Additional evaluations on SORRY-Bench and XSTest show strong in-domain robustness but limited out-of-domain generalization, suggesting that future work should diversify fine-tuning data to help models apply interventions selectively rather than schematically.

Key Contributions

This paper presents research in the following areas:

cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

Gianluca Barmina
Federico Torrielli
Sven Harms
Jacob Nielsen
Felix Mächtle
Stine Lyngsø Beltoft
Peter Schneider-Kamp
Thomas Eisenbarth
Lukas Galke Poech
Anne Lauscher

Paper Information

arXiv ID: 2606.09697v1
Categories: cs.CL
Published: June 8, 2026
PDF: Download PDF

[Paper] PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

[Paper] Doc-to-Atom: Learning to Compile and Compose Memory Atoms

[Paper] Redesign Mixture-of-Experts Routers with Manifold Power Iteration

[Paper] System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5