AI Hallucination: a symptom or root cause?

Last updated: Dec 04, 2025

Have you ever wondered why AI hallucination has become so frequent & common these days?

Have you ever thought why earlier versions of AI were far better and straight to the point even if the prompt is same?

The reason is excessive hallucination by the GenAI models.

Even if you are writing "it", there is a possibility that GenAI model will treat it in the context of "Information Technology", not as a pronoun.

One day I was sitting & thinking about why is it so. Then I realized AI hallucination is a symptom, not the reason or root cause of the behaviour. If it's so, why has it become so frequent & does not perform well, even if you've mastered the prompting skills.

What came as a result was shocking.

It's not AI algorithms; it's how we have programmed the data inside LLM.

I've been using GenAI since May 2023, I started using it within 6 months of it's first official launch. Responses in those days were used to be very crisp, precise, and straightforward. But today, we get responses that are more sugar-coated, repeatitive in pattern, and referring to assumptions than facts.

There are 3 reasons behind this:

  1. How data is stored (Technical)
  2. What data is stored (Informational)
  3. Why data is stored (Logical)

Let's start with technical aspect of it.

How data is stored.

LLM models (or simply, AI) saves the data in a blurred image format. Consider it like a photo you clicked in a dimmed light or image is compressed before saving. Likewise, LLM stores data in a way that it captures the essence but loses the fine details or find it difficult to build sequences of the context. This leads to hallucinations.

This might sound like a technical glitch but it's not. It's a deliberate design choice to optimize storage & retrieval speed. However, this trade-off came at the cost of reliability.

Next is informational aspect of it.

What data is stored.

LLM models are trained on vast amounts of data from diverse sources, including the ones written using same GenAI tool. This made some of the data reliable or accurate than others. When LLM models get exposed to low-quality or biased data, they tend to learn and replicate those biases in their responses. And that's what has happened & still happening these days.

Likewise, when the data is outdated. LLM models may not have access to the latest information, leading to hallucinations when they attempt to provide answers based on obsolete data or incomplete information.

Let me give you a real example of my own. You ask any GPT model about my "GSD Triangle" or full form of it and it will confidently explain you what it is. While in reality, this is my internal framework and does not reveal specifics on public platforms.

Such instances happen because the LLM contains related data points or keywords from various sources and tries to connect the dots, leading to hallucinations.

Finally, we have the logical aspect of it which is more than just being logical aspect.

"Why data is stored" is more of a governance problem.

Since ChatGPT gained popularity following the launch of Bard (now, Gemini), many startups & enterprises rushed to join the race. Key focus and top priority was quantity over quality of data whether it's data collection or storage. It just became an F1 race happening on streets where everyone wants to win, even if that destroys the harmony & sanctinity of the society.

The gist is there was a lack of standardized protocols for data validation & verification. We've already seen few instances around misleading data in public news reports. Without proper checks, erroneous or misleading information found its way into the training datasets that only exacerbated the AI hallucination problem.

What is the Solution?

To mitigate AI hallucination, it's crucial to address these three aspects on priority. This includes:

  1. improving data storage techniques to preserve finer details
  2. curating high-quality and up-to-date datasets
  3. implementing robust data governance practices

In businesses, we must be cautious while deploying GenAI solutions. Relying solely on AI-generated content without human oversight can lead to misinformation and errors. That is why, it is very much essential to have humans in the loop and define ownership of the systems & data where experts review and validate AI inputs/outputs before they are used in critical applications.

Hope this will also help you mitigate one of your biggest fears that "AI will replace your job". It's actually on the path of generating more jobs but the question is are we working in the right direction?

Recent Blogs