Synopsis
Research from Johns Hopkins University indicates that current AI models struggle to grasp the social dynamics and context necessary for effective human interaction, highlighting a significant limitation in AI technology development.Key Takeaways
- AI models lack understanding of social interactions.
- Humans outperform AI in interpreting dynamic scenes.
- AI needs to recognize human intentions for practical applications.
- Research highlights a blind spot in AI model infrastructure.
- AI development is influenced by static image processing capabilities.
New Delhi, April 24 (NationPress) Artificial intelligence (AI) systems are currently unable to grasp the social dynamics and context essential for effective human interaction, as indicated by research from Johns Hopkins University published on Thursday.
The findings reveal that humans outperform contemporary AI models in describing and interpreting social interactions in dynamic environments. This ability is crucial for technologies such as self-driving cars, assistive robots, and other applications that depend on AI systems navigating the real world, according to the researchers from this prestigious US institution.
“For instance, an AI intended for a self-driving car must understand the intentions, goals, and actions of both drivers and pedestrians. It should be able to anticipate which direction a pedestrian is about to move or discern if two individuals are engaging in conversation or preparing to cross the street,” explained Leyla Isik, the lead author and an assistant professor of cognitive science at Johns Hopkins University.
“Whenever AI is required to interact with humans, it is imperative for it to recognize what people are doing. This highlights the current limitations of these systems,” added Isik.
To evaluate how AI models compare with human perception, researchers had human participants observe three-second video clips and rate the features vital for understanding social interactions on a scale from one to five.
The clips showcased individuals either engaging with one another, performing activities side by side, or undertaking solo actions.
Subsequently, over 350 AI models—encompassing language, video, and image systems—were tasked with predicting human judgments regarding the videos and their brain responses to viewing them. For large language models, the AIs were asked to assess concise, human-generated captions.
The outcomes starkly contrasted with AI’s proficiency in analyzing still images.
“Merely recognizing objects and faces in an image is insufficient. That was merely the first phase of AI development. However, real life is dynamic. We require AI to comprehend the unfolding narrative within a scene. Grasping the relationships, context, and dynamics of social interactions represents the subsequent milestone; this research indicates a potential blind spot in AI model advancement,” elaborated Kathy Garcia, a doctoral student in Isik’s lab.
Researchers speculate that this limitation arises because AI neural networks were modeled after the brain's regions responsible for processing static images, which differ from those that handle dynamic social scenarios.