Machines might just be beating us at reading feelings, according to new research that has stirred debate among psychologists and tech experts alike.
Researchers recently pitted popular artificial intelligence models against humans in a battery of emotional intelligence tests used worldwide. These assessments are designed to see how well someone can identify the right response in tense, emotionally charged situations.
AI models like ChatGPT-4, Gemini, Claude, Copilot and DeepSeek were placed side by side with people, all faced with the same challenging scenarios. The outcome was striking. The machines chose what experts deemed the “correct” response about eighty percent of the time, compared with people, who scored just over half right.
The experiments were not limited to answering questions. AI was also asked to come up with fresh test items that matched the difficulty and style of the originals. Human judges confirmed these were as tough, showing that AI could even step into the shoes of a test designer.
When Machines and Emotions Collide
As soon as these results landed, a wave of skepticism followed. Experts highlight that these tests were multiple choice, a format far removed from the muddle of real life, where human reactions play out unexpectedly.
Taimur Ijlal, a finance and infosec expert, was blunt in his assessment: “So ‘beating’ a human on a test like this doesn’t necessarily mean the AI has deeper insight. It means it gave the statistically expected answer more often.”
Other voices echo caution. Nauman Jaffar, a founder at a mental health tech firm, pointed out that large language models excel at spotting patterns in words, tones and facial clues — but that does not prove genuine understanding. In his view, assuming true empathy from such recognition could be misleading.
Structured quizzes are where machines excel, while life’s messier emotional exchanges frequently trip up both humans and algorithms. Jason Hennessey, with experience analyzing search and language processing AI, compares this to the classic “Reading the Mind in the Eyes Test.” He notes that machines falter when details change, from lighting to cultural cues: “AI accuracy drops off a cliff.”
Some experts see value in emotion detection capabilities, yet warn against mistaking test performance for deep connection. Wyatt Mayham, an IT consultant, quipped, “It’s like saying someone’s a great therapist because they scored well on an emotionally themed BuzzFeed quiz.”
Yet there are exceptions. Thousands of truck drivers in Brazil now interact with Aílton, a chat assistant that can tell when a person is angry, sad, or stressed, with roughly eighty percent accuracy — significantly higher than human dispatchers.
Real-time examples show this AI responding gently to distress calls, offering condolences and mental health resources when tragedy strikes. Its developer, Marcos Alves, believes sophisticated pattern analysis is a strength, but he acknowledges that “Real empathy is continuous and multimodal.”
He also contends that, even with lab limitations, data from messaging apps indicate that AI is frequently outpacing people at noticing subtle emotional currents in regular conversation.
In many ways, AI appears to be learning the art of compassion — at least when it comes to picking the right answer, a trend also evident as Chinese AI startup DeepSeek gains popularity for its advances in language understanding.