Inside UAE’s most ambitious AI model: Teaching robots how to think

Researchers at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) have developed PAN, a groundbreaking AI model that represents a significant leap beyond conventional video generation systems. Unlike text-to-video models like OpenAI’s Sora and Google’s Veo that focus on visual replication, PAN employs a sophisticated hybrid architecture that understands physical dynamics, cause-effect relationships, and real-world semantics.

The system functions as a comprehensive world model, enabling robotic agents to simulate thousands of interactions within virtual environments before physical execution. This approach addresses a fundamental challenge in robotics development: the prohibitive costs and risks associated with real-world training. Where current robotics companies require hundreds of human operators and thousands of repetitive demonstrations to teach basic skills, PAN accelerates learning by approximately 430,000 times compared to physical training methods.

Jon Carvill, Vice President of Marketing and Communications at MBZUAI, explains that PAN’s architecture combines diffusion models for visual fidelity with large language model capabilities that maintain world semantics over extended sequences. This allows the system to maintain internal memory of scene elements and object movements, updating its understanding frame-by-frame rather than generating complete videos in a single pass.

The implications extend across multiple domains, from autonomous vehicles navigating complex traffic scenarios to household robots performing domestic tasks like laundry folding and dishwasher loading. By compressing decades of physical practice into computational hours, PAN dramatically reduces development costs and accessibility barriers for advanced robotics.

This development positions MBZUAI at the forefront of embodied AI research—systems that understand physical consequences rather than merely recognizing data patterns. The university’s distributed development model, leveraging teams in both Abu Dhabi and Silicon Valley, has accelerated PAN’s creation through clearly defined research pipelines and global talent integration.

Looking toward 2030, researchers envision PAN-like world models becoming the standard foundation for intelligent agents, enabling safe autonomous systems and AI that comprehends consequence rather than mere correlation.