The Institute of Industrial Artificial Intelligence under the Chinese Academy of Sciences has achieved a significant milestone, with its self-developed world model PAIWorld topping the prestigious WorldArena leaderboard. The announcement, made in late June 2026, marks a notable achievement for the institute, which is a national-level research institution jointly established by the Chinese Academy of Sciences, Jiangsu Province, and Nanjing Municipality. This accomplishment highlights China’s growing capabilities in the field of world models, a cutting-edge area of artificial intelligence that focuses on understanding and predicting the physical world.
WorldArena is widely regarded as the most authoritative evaluation benchmark in the world model domain. Co-launched by Tsinghua University and Princeton University, among other top academic institutions, the leaderboard provides a comprehensive assessment across six critical dimensions: visual quality, motion quality, content consistency, physics adherence, 3D accuracy, and controllability. It serves as a rigorous testing ground for leading global models, attracting competitors from top research labs and companies including those led by prominent figures like Li Fei-Fei, as well as teams from Google, NVIDIA, Stanford University, and various Chinese tech firms. In the ranking updated on June 19th, 2026, PAIWorld secured the top position with an overall score.

PAIWorld demonstrated exceptional performance in several key individual metrics. It achieved a notably high score in Motion Smoothness, showcasing the model’s ability to accurately model continuous motion in the real world. Even more impressively, it significantly outperformed the runner-up in Trajectory Accuracy. This means PAIWorld can not only generate fluid and natural motion but also accurately maintain the trajectory of objects and cameras over long time-series predictions, effectively reducing trajectory drift and spatial misalignment. As the team leader, Xu Kai, explained, the accuracy of trajectory prediction is not just a performance indicator but a critical safety metric, especially in applications like autonomous driving and industrial manufacturing.
The technical success of PAIWorld is rooted in its innovative architecture, which integrates 3D geometric priors with multi-view spatiotemporal modeling. By embedding real-world depth, surface geometry, and occlusion relationships as explicit constraints into the generation process, and by employing a geometric rotary position encoding and multi-view attention mechanisms, the model achieves robust cross-view 3D perception and consistent simulation of the physical world. Looking ahead, the research team plans to further refine PAIWorld and leverage it to build a virtual training ground for embodied AI robots, enabling self-improvement and continuous evolution. This achievement not only demonstrates the model’s comprehensive maturity in core technologies but also signals a significant step forward in the application of world models for safer and more precise decision-making in complex, real-world scenarios.
