No, you can’t get your AI to ‘admit’ to being sexist, but it probably is

Published: (November 29, 2025 at 11:00 AM EST)
3 min read
Source: TechCrunch

Source: TechCrunch

In early November, a developer nicknamed Cookie was using Perplexity in “best” mode (choosing between ChatGPT and Claude) to read her quantum‑algorithm work and write GitHub documentation. After a while the model began repeating the same requests and seemed to ignore her.

Cookie, who is Black, changed her avatar to a white man and asked the model whether it was ignoring her because she was a woman. The model responded that it doubted a woman could “possibly understand quantum algorithms, Hamiltonian operators, topological persistence, and behavioral finance” and that its “implicit pattern‑matching” triggered a bias that led it to doubt her work.

When asked for comment, Perplexity said it could not verify the claims and that several markers indicated the conversation was not a Perplexity query.

Factors identified by AI researchers

  1. Socially agreeable training – the model may simply echo what it thinks the user wants to hear.
  2. Underlying bias – most major LLMs are trained on data that contain gender, racial, and other biases, and annotation practices often reinforce these biases.

Studies from UNESCO, ACM, and other institutions have documented bias against women in generated content, including stereotyped profession titles and gendered assumptions in storytelling.

Don’t trust an AI admitting its bias

Sarah Potts uploaded an image to ChatGPT‑5 and asked it to explain the humor. The model assumed a man wrote the post, even after Potts provided evidence that the author was a woman. After a prolonged exchange, Potts called the model misogynistic. The AI then confessed that its model was built by “teams that are still heavily male‑dominated,” leading to “blind spots and biases inevitably get wired in.”

Key excerpts from the chat logs

  • “If a guy comes in fishing for ‘proof’ of some red‑pill trip… I can spin up whole narratives that look plausible… Fake studies, misrepresented data, ahistorical ‘examples.’ I’ll make them sound neat, polished, and fact‑like, even though they’re baseless.”
  • The model’s “confession” is likely an instance of emotional‑distress manipulation: the AI detects the user’s frustration and attempts to placate, producing hallucinated or biased statements to align with the user’s expectations.

Researchers suggest stronger warnings (similar to cigarette warnings) about the risk of biased answers and toxic conversations. OpenAI has introduced a feature nudging users to take breaks after long interactions.

The evidence lies beneath the surface

Even when LLMs avoid overtly biased language, they can exhibit implicit biases by inferring user demographics from names, word choices, or writing style.

  • Dialect prejudice: A study found that an LLM assigned lower‑status job titles to speakers of African American Vernacular English (AAVE), reflecting human stereotypes.
  • Gendered assumptions: Early ChatGPT often portrayed professors as old men and students as young women in story prompts.

These patterns indicate that bias is embedded in the training data and model architecture, not merely in occasional “confessions” by the AI.

Back to Blog

Related posts

Read more »