Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Published: (May 4, 2026 at 04:48 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Test Setup

We’ve been running a series of experiments using ChatGPT 5.4 integrated into a website chatbot across different environments (main website). The goal is to simulate realistic user behavior and observe how the model responds over time.

The chatbot is designed to answer strictly based on website content (RAG‑like approach). Over time we intentionally tested recurring patterns such as product comparisons to approximate real‑world usage rather than synthetic benchmarks.

Observation

At one point a real user asked:

“How can you help my ecommerce?”

The answer was:

“I can help your e‑commerce by answering visitors …, for example asking how many people they cook for to recommend the right cast iron pot, or asking for a price range to help them find products …”

What’s Interesting

The response closely mirrors the exact interaction patterns we had been testing manually. It wasn’t a generic explanation; it followed a guided questioning style that matched our test scenarios.

Possible Explanations

  • Prompt conditioning over time – consistent system prompts combined with recurring user patterns may be influencing the model’s behavior.

Broader Question for Builders

When deploying LLMs in structured environments (chatbots, RAG systems, product assistants), does repeated real‑world usage shape outputs in a measurable way?

Or are we simply observing better alignment due to consistent prompting and context injection?

Why This Matters

If usage patterns do influence outputs (even indirectly), then testing is not just evaluation—it becomes part of the system’s ongoing adaptation.

Implications for RAG Pipelines

  • Have you noticed similar effects?
  • Does your system behave differently after repeated real‑world usage patterns?

Let’s compare notes.

0 views
Back to Blog

Related posts

Read more »

AI Isn't Stupid. Your Setup Is. 🛠️

The latest discourse I hear usually sounds something like, “I tried insert agent flavor of the week and it gave me garbage. AI is overrated.” My response: “No....