Google's AI based Vision-Language-Action robot ready, capable of performing various tasks
In today's world of technology, many innovations are being discovered every day. Technology is currently developing beyond our thoughts. Google has demonstrated its first Vision-Language-Action (VLA) model for robot control, which can act on many common commands with robotic data and semantic and visual understanding.
What does this model do?
This can include interpreting new commands and acting on users' commands based on a priori logic, such as reasoning about object categories or high-level details.
What is Robotic Transformers 2?
Robotic Transformer 2 (RT-2) is a new vision-language-action (VLA) model that learns from both web and robotics data and translates this knowledge into generic instructions for robotic control. RT-2's flexible approach enables the robot to vary how it works its arms to pick up a cube, another toy, for example.
Work will understand the need of the person
According to one official, incorporating chain-of-thought reasoning allows the RT-2 to perform multi-stage semantic reasoning, such as determining which object can be used as an improvised hammer or which type of energy drink a tired person needs. to do
Successful after 6,000 robotic trials
The model is based on the latest robotic Transformer 1 that was trained on a multi-task demonstration. The team conducted a series of quantitative and qualitative experiments on RT-2 models over 6,000 robotic trials. As the RT-2 model demonstrates, vision-language models can be transformed into powerful vision-language-action models, which can directly control a robot by combining VLM pre-training with robotic data.
What is RT-2?
Google DeepMind's RT-2 is not just a simple and effective modification of existing VLM models, but demonstrates the creation of a general-purpose physical robot that can perform logic, problem solving, and information interpretation to perform a wide range of tasks in the modern world.
Comments
Post a Comment