[Paper] The author is dead, but what if they never lived? A reception experiment on Czech AI- and human-authored poetry

Published: (November 26, 2025 at 12:53 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.21629v1

Overview

A recent study investigates whether native Czech speakers can tell the difference between poems written by humans and those generated by large language models (LLMs). Surprisingly, participants guessed the authorship correctly only about half the time, and their aesthetic ratings were heavily swayed by what they believed about the poem’s origin. The work shows that modern LLMs can produce convincing creative text even in a morphologically rich, low‑resource language like Czech.

Key Contributions

  • Cross‑lingual creativity test: First systematic evaluation of AI‑generated poetry in Czech, a language under‑represented in LLM training data.
  • Authorship detection at chance level: Participants identified AI vs. human poems correctly only 45.8 % of the time.
  • Authorship bias in aesthetic judgment: When a poem was thought to be AI‑generated, it received lower ratings, even though the actual AI poems were rated as highly as—or higher than—human poems.
  • Statistical insight: Logistic regression revealed that higher enjoyment of a poem reduced the likelihood of correctly assigning its authorship; literary expertise had no measurable effect.
  • Implications for human‑AI interaction: Demonstrates that belief about a text’s origin can shape perceived quality, a phenomenon relevant to content moderation, education, and creative collaboration tools.

Methodology

  1. Corpus creation – The authors collected a balanced set of Czech poems: half written by contemporary Czech poets, half generated by a state‑of‑the‑art LLM fine‑tuned on Czech text.
  2. Participant recruitment – 200 native Czech speakers (mixed ages, education levels, and poetry familiarity) were recruited online.
  3. Experiment design – Each participant read a random selection of poems and, for each, performed two tasks:
    • Authorship guess: “Human” or “AI?”
    • Aesthetic rating: 1–7 Likert scale for overall quality, emotional impact, and linguistic elegance.
  4. Data analysis – Accuracy rates were compared against chance, and mixed‑effects logistic regression modeled the relationship between aesthetic scores, participant background, and authorship‑guess correctness.

The design avoids technical jargon: think of it as a “blind taste test” for poetry, where the “ingredients” are either human‑crafted or AI‑cooked, and the judges also rate how much they liked each dish.

Results & Findings

  • Authorship detection: 45.8 % correct (≈ chance). No significant advantage for participants with literary training.
  • Aesthetic scores: When participants thought a poem was AI‑generated, they gave it an average of 0.6 points lower on the 7‑point scale, despite the actual AI poems scoring on par with human poems.
  • Regression outcome: The probability of a correct authorship guess decreased as the participant’s enjoyment rating increased (β = ‑0.42, p < 0.01).
  • No familiarity effect: Years of reading poetry or having a degree in literature did not improve detection accuracy.

In plain terms: people liked the poems, but their belief about who wrote them colored their judgment.

Practical Implications

  • Content creation tools: Developers can embed LLM‑generated poetry (or broader creative text) into apps, newsletters, or social media bots without users immediately spotting the AI origin.
  • Education & literary analysis: Teachers should be aware that students may be evaluating AI‑generated works differently simply because of perceived authorship, which could affect grading or critique practices.
  • Brand storytelling: Companies can experiment with AI‑crafted slogans, jingles, or short verses, leveraging the “human‑like” quality while managing expectations about authenticity.
  • Bias mitigation: Platforms that label AI‑generated content should consider how such labels might unintentionally lower perceived quality, potentially influencing user engagement metrics.
  • Multilingual AI development: The success in Czech suggests that fine‑tuning LLMs on modestly sized corpora can yield high‑quality creative output for other low‑resource languages, opening doors for localized content generation.

Limitations & Future Work

  • Dataset size & diversity: The poem set was relatively small and limited to contemporary styles; broader genres (e.g., epic, experimental) might yield different detection rates.
  • Model specifics: Only one LLM architecture was tested; results could vary with other models or prompting strategies.
  • Cultural nuance: While Czech is morphologically complex, it still shares many Indo‑European roots with the training data; truly under‑represented languages (e.g., minority languages with limited digital text) remain an open question.
  • Long‑term perception: The study measured immediate reactions; future work could explore how repeated exposure to AI poetry shapes long‑term aesthetic standards and trust.

Overall, the paper provides a compelling glimpse into how AI can blend into the cultural fabric of a language community—and why our beliefs about authorship matter just as much as the text itself.

Authors

  • Anna Marklová
  • Ondřej Vinš
  • Martina Vokáčová
  • Jiří Milička

Paper Information

  • arXiv ID: 2511.21629v1
  • Categories: cs.CL
  • Published: November 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »