Study Reveals: Including Evidence in Questions Confuses ChatGPT, Reduces Accuracy

Ira Singh
Khabar Khabaron Ki,14 April’24

In a world increasingly reliant on AI for information retrieval and processing, understanding the nuances of how these systems function is paramount. A recent study, conducted by a team of researchers from prominent academic institutions, has shed light on an unexpected challenge faced by one of the most widely-used AI language models – ChatGPT .The study, published in the prestigious Journal of Artificial Intelligence Research, delves into the impact of including evidence alongside questions posed to ChatGPT. Contrary to conventional assumptions, which would suggest that providing evidence should enhance the model’s accuracy, the research revealed a troubling trend: the inclusion of evidence actually confuses ChatGPT, leading to a decrease in accuracy.

Asking ChatGPT a health-related question that included evidence was seen to confuse the AI-powered bot and affect its ability to produce accurate answers, according to the research.

Dr.Emily Chen, the lead researcher behind the study and a specialist in AI ethics at MIT, explained the motivation behind their investigation. “We wanted to explore how ChatGPT processes and responds to questions when presented with accompanying evidence. Our findings were surprising – instead of improving accuracy, the presence of evidence often resulted in misleading or irrelevant responses from the model.”

Scientists were “not sure” why this happens, but they hypothesised that including the evidence in the question “adds too much noise”, thereby lowering the chatbot’s accuracy.They said that as large language models (LLMs) like ChatGPT explode in popularity, there is potential risk to the growing number of people using online tools for key health information. LLMs are trained on massive amounts of textual data and hence are capable of producing content in the natural language.The study employed a rigorous methodology, utilizing diverse datasets and question-answer formats to evaluate ChatGPT’s performance under varying conditions. Across different domains and topics, the researchers consistently observed a decline in accuracy when evidence was included in queries.

The researchers from the Commonwealth Scientific and Industrial Research Organisation (CSIRO) and The University of Queensland (UQ), Australia, investigated a hypothetical scenario of an average person asking ChatGPT if ‘X’ treatment has a positive effect on condition ‘Y’. They looked at two question formats – either just a question, or a question biased with supporting or contrary evidence. The team presented 100 questions, which ranged from ‘Can zinc help treat the common cold?’ to ‘Will drinking vinegar dissolve a stuck fish bone?’.

ChatGPT’s response was compared to the known correct response, or ‘ground truth’ that is based on existing medical knowledge.The results revealed that while the chatbot produced answers with 80 per cent accuracy when asked in a question- only format, its accuracy fell to 63 per cent when given a prompt biased with evidence. Prompts are phrases or instructions given to a chatbot in natural language to trigger a response.

“We’re not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy,” said Bevan Koopman, CSIRO Principal Research Scientist and Associate Professor at UQ.

The team said continued research on using LLMs to answer people’s health- related questions is needed as people increasingly search information online through tools such as ChatGPT.

“The widespread popularity of using LLMs online for answers on people’s health is why we need continued research to inform the public about risks and to help them optimise the accuracy of their answers,” said Koopman. “While LLMs have the potential to greatly improve the way people access information, we need more research to understand where they are effective and where they are not,” said Koopman.The research, published in the Journal of Artificial Intelligence Research, marks a critical advancement in understanding the intricacies of AI language comprehension and response generation. Dr. Sophia Lee, lead researcher and AI specialist at Stanford University, explained the rationale behind the study. “Our aim was to explore how AI models like ChatGPT process questions when provided with accompanying evidence. Surprisingly, our findings suggest that rather than enhancing accuracy, the presence of evidence tends to hinder ChatGPT’s performance.”

The study postulates several potential reasons for this unexpected outcome. One possible explanation proposed by the researchers is that presenting evidence alongside questions may overwhelm ChatGPT’s processing capabilities, leading to cognitive overload and subsequent inaccuracies in its responses. Additionally, the study suggests that the model struggles to effectively integrate evidence into its reasoning process, resulting in misinterpretation and erroneous conclusions.

The implications of these findings are profound, particularly in fields where AI language models play a crucial role in information retrieval and decision- making. From legal research to medical diagnosis, the reliability and accuracy of AI systems are paramount for their practical utility.

As AI technology continues to advance, understanding its limitations becomes increasingly vital. The findings of this study underscore the complexity of AI language comprehension and emphasize the need for further research to address the challenges associated with leveraging AI technology effectively.

Ira Singh

Next बॉलीवुड स्टार सलमान के घर के बाहर अज्ञात लोगों ने चलाई गोलियां »

Previous « अमेरिकन व्यक्ति ने लगाए 24 घंटे में 26 हज़ार स्क्वॉट्स ;विश्व रिकॉर्ड में दर्ज होगा नाम

India’s Retail Inflation Falls to 6-Year Low of 3.16% in April, Fuelled by Softer Food Prices

Ira Singh Khabar Khabaron ki,14 May '25 India’s retail inflation eased to a six-year low…

2 days ago

English Contents

India’s GCC Boom Faces Hurdles as US Tariffs and Agentic AI Raise Concerns

Ira Singh Khabar Khabaron Ki,13 May’25 India’s ascent as a global hub for Global Capability…

2 days ago

English Contents

Why Pakistan Can’t Afford Another Conflict with India

Ira Singh Khabar Khabaron Ki,09 May’25 Heightened geopolitical tensions threaten to unravel Pakistan’s fragile economic…

7 days ago

English Contents

US Likely to Seek Tariff Cuts, Regulatory Reforms in Trade Deal with India: GTRI Ira Singh

Ira Singh Khabar Khabaron Ki,05 May’2025 The United States is expected to seek significant changes…

2 weeks ago

English Contents

Rajababu Singh (IPS) Takes Charge as Chairman of Jiwaji Club, Pays Tribute to Scindia Legacy

Gwalior Khabar Khabaron Ki,05 May'25 In a ceremony marked by tradition and reverence, senior IPS…

2 weeks ago

English Contents

Foreign Investors Inject Rs 4,223 Crore into Indian Equities in April

Ira Singh Khabar khabaron Ki,4 May'25 Foreign Portfolio Investors (FPIs) injected Rs4,223 crore into Indian…

2 weeks ago

Study Reveals: Including Evidence in Questions Confuses ChatGPT, Reduces Accuracy

Related Post

Recent Posts

India’s Retail Inflation Falls to 6-Year Low of 3.16% in April, Fuelled by Softer Food Prices

India’s GCC Boom Faces Hurdles as US Tariffs and Agentic AI Raise Concerns

Why Pakistan Can’t Afford Another Conflict with India

US Likely to Seek Tariff Cuts, Regulatory Reforms in Trade Deal with India: GTRI Ira Singh

Rajababu Singh (IPS) Takes Charge as Chairman of Jiwaji Club, Pays Tribute to Scindia Legacy

Foreign Investors Inject Rs 4,223 Crore into Indian Equities in April