Google DeepMind: Google is developing a human-like thinking AI agent that can handle many complex tasks in a jiffy...
Artificial intelligence is no longer limited to writing emails or planning vacation trips for you. Google DeepMind recently launched a new version of its "Scalable Instructable Multiworld Agent," or SIMA. SIMA 2 is significantly more advanced than the previous model, demonstrating human-like understanding, planning, and learning capabilities in virtual worlds. This upgrade builds on the first SIMA model released in March 2024 and is powered by Google's Gemini models.
How does SIMA 2 work?
SIMA 2 understands a given task by taking visual input from a 3D game world. If a user gives it instructions like "build a shelter" or "find the red house," it first breaks the goal down into smaller steps and then completes them one by one using inputs like keyboard and mouse. In this way, the AI transforms instructions into actual actions based on what is seen on the screen.
Demonstrated Excellence in New Games
DeepMind also tested SIMA 2 in game environments it had never encountered before. These included Minedojo (a research platform based on Minecraft) and ASKA (a Viking-themed survival game). In both, SIMA 2 demonstrated better adaptation and higher task success rates than the previous version.
It can also understand various types of input, such as sketches, emojis, or instructions in different languages. The AI can apply concepts learned from one game to another, significantly increasing its learning capacity.
What could SIMA 2 change in the future?
The company believes that the 3D game world is a testing ground for this AI model. Success in this could lead to the creation of superior AI agents capable of operating in the real world. DeepMind's goal is to create AI that understands multiple languages, plans, and controls machines in the real world. SIMA 2 lays a strong foundation in this direction and could transform general-purpose robotics in the future.
Limitations Remain
According to DeepMind, SIMA 2 has demonstrated significant capabilities, but still struggles with long-term memory, very complex multi-step reasoning, and highly precise low-level control. This makes it impossible to directly use it in physical robots at this time.
Disclaimer: This content has been sourced and edited from Amar Ujala. While we have made modifications for clarity and presentation, the original content belongs to its respective authors and website. We do not claim ownership of the content.

