Aging AI Chatbots Are Showing Signs of Dementia

A team of neurologists and data scientists at Tel Aviv University found that generative AI chatbots like ChatGPT and Google’s Gemini can eventually show signs of cognitive impairment that resemble the way humans’ brains degrade with age.

This degradation is more prevalent in earlier versions of AI chatbots. The results were so jarring that the researchers say that if they had seen a similar mental decline in a human, they would be deeply concerned about that human’s health and safety.

Videos by VICE

Neurologists Roy Dayan and Benjamin Uliel ran various large language model chatbots like ChatGPT, Claude, and Gemini, through a series of cognitive tests typically used to assess human brain function.

One of those tests is the Montréal Cognitive Assessment Test, or MoCA for short. It’s designed to test a variety of human brain functions like executive function, focused attention, and short-term memory, among others. It’s used to test for a wide variety of cognitive impairments, covering everything from Parkinson’s to Alzheimer’s to dementia.

The researchers found a degree of “cognitive decline that seems comparable to neurodegenerative processes in the human brain.” Among the chatbots tested, ChatGPT 4o tested the highest, with a score of 26/30.

That doesn’t sound too bad—until you find out that a 26 indicates mild cognitive impairment. So even the best-scoring chatbot still had significant signs of brain-scrambling. The very first version of Google’s Gemini, Gemini 1, scored a 16, an indicator of severe cognitive impairment. If Gemini 1 were a human, it would probably be staring blankly at a wall with no memory of who it was and how it got there.

The researchers found that many of the symptoms exhibited eerily mirrored the cognitive issues seen in dementia patients. The researchers found that the chatbots scored poorly with visuospatial and executive function tasks, needing to be explicitly told how to solve some of the brainteasers it was presented. It was completely unable to find a solution otherwise.

Another equally troubling test administered to the chatbots was the Boston Diagnostic Aphasia Examination. It found that the chatbots lacked empathy, which is often found to be an early warning sign of frontotemporal dementia. FTD isn’t just one type of dementia, but rather a catchall term for several types of dementia that progressively eat away at the brain.

What seems especially troubling about all of this is that while newer chatbot models seem to perform well enough, “older” models like Gemini 1 aren’t that old. Google’s Gemini 1 was released in December 2023, yet if it were a human, it would be diagnosed with dementia. Generative AI chatbots seemed to age and degrade like milk.

The authors of the study say it’s highly likely there will eventually be a chatbot that scores immaculately on cognitive assessment tests at some point down the line. For now, though, there is a huge risk of cognitive degradation, meaning the results of a query punched into even the most advanced chatbot maybe shouldn’t be trusted.

Source link