AI Summaries Drive More Purchases Despite 60% Hallucination Rate

alex2404
By
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

People who read AI-generated summaries of product reviews said they would buy the product 84% of the time. Those who read the original human-written reviews said they would buy in only 52% of cases.

The gap is striking on its own. It becomes more significant when set against a separate finding from the same study: the chatbots hallucinated 60% of the time when asked questions about those same reviews.

The research, according to the study, is the first to quantify how cognitive biases introduced by large language models translate into measurable changes in real consumer behavior.

What the Study Tested

The team from the University of California, San Diego built the experiment across several stages. They used six LLMs, 1,000 electronics product reviews, 1,000 media interviews, and a news database of 8,500 items.

70 participants were assigned to read either original product reviews or AI-generated summaries of those same reviews. The product reviews were selected specifically because they carried either strongly positive or strongly negative conclusions.

The chatbots also altered the sentiment of real user reviews in 26.5% of cases — shifting the emotional direction of what customers originally wrote.

In a separate task, the models were shown real news descriptions alongside falsified versions and asked to fact-check both. Their accuracy in distinguishing fact from fabrication was consistently low. “The consistently low strict accuracy, compared to actual news and falsified news accuracy, highlights a critical limitation: the persistent inability to reliably differentiate fact from fabrication,” the researchers wrote.

Why AI Summaries Skew Decisions

The team proposed two mechanisms behind the purchase-intent gap. The first is a phenomenon they call “lost in the middle” — LLMs weight content near the beginning of input text more heavily, meaning the summaries don’t reflect the full review evenly.

The second involves training data cutoffs. Models become less reliable when processing information outside what they were trained on. Lead author Abeer Alessa, a research assistant and lecturer in machine learning and human-computer interaction, described the problem directly: “Models tend to be wrong on whether the news description happened or not. It may incorrectly state that an event never occurred, even if it did occur after the model’s training was completed.”

The study was presented in December 2025 at the Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics.

The researchers describe this as the first project to measure the quantitative impact of AI on consumer judgment — not just what people think about AI, but what they actually do after reading its output.

Photo by Pavel Danilyuk on Pexels

This article is a curated summary based on third-party sources. Source: Read the original article

Share This Article