Back to All
OnQ Blog

Embodied AI: How do AI-powered robots perceive the world? [+video]

Engineers in a camera lab outfitted with a robotic arm.

While robots have proliferated in recent years in smart cities, factories and homes, we are mostly interacting with robots controlled by classical handcrafted algorithms. These are robots that have a narrow goal and don’t learn from their surroundings very much. In contrast, artificial intelligence (AI) agents — robots, virtual assistants or other intelligent systems — that can interact with and learn from a physical environment are referred to as embodied AI. These agents are equipped with sensors (cameras, pressure sensors, accelerometers, etc.) that capture data from their surroundings, along with AI systems that can analyze and “learn” from the acquired data.

 

A graphic explaining how AI-powered robots learn through interaction with a physical environment.
AI-powered robots learn through interaction with a physical environment.

Through trial and error, the AI agent develops a “world view”: an abstract representation and understanding of the spatial or temporal dimensions of our world. It learns to reach its goal, whether the goal is to walk, stack boxes or something else entirely.

Embodied AI can transform industries and improve lives. The opportunities are endless.

 

Think enhancing the manufacturing process, making entertainment and games more interactive and immersive, improving medical triage, surgery and elderly assistance, and making smart warehouses much more efficient and automated. The need for embodied AI is certainly there.

The aging population and labor shortages are already felt, especially in the developed world.1 In the past years, robot density in the manufacturing industry has already increased significantly as a result. In the U.S., the density of robots per 10,000 employees grew to 255, a 45% increase from 2015.2

 

AI-powered robotics has potential to improve manufacturing, digital healthcare, entertainment and warehouses.
AI-powered robotics has immense potential for improving society.

What is needed for embodied AI to proliferate

At Qualcomm AI Research, we are working on applications of generative modelling to embodied AI and robotics, in order to go beyond classical robotics and enable capabilities such as:

  • Open vocabulary scene understanding.
  • Natural language interface.
  • Reasoning and common sense via large language models (LLMs).
  • Closed-loop control, dynamic actions via LLMs or diffusion models.
  • Vision-language-action models.

Robotics has a need for data efficiency, low latency, enhanced privacy and sensor processing. All these requirements can be achieved through on-device AI, which is why Qualcomm Technologies has been developing platforms to support the creation of more productive, autonomous and advanced robots, such as the Qualcomm Robotics Platforms. These platforms include the Qualcomm AI Engine, providing capabilities that can unleash innovative applications and possibilities.

 

Graphic explaining that AI processing at the edge meets the needs of embodied AI.
AI processing at the edge meets the needs of embodied AI.

A data-efficient robot motion planning architecture

While AI processing at the edge constitutes a good basis for building embodied AI applications, there is a critical issue that remains to be solved. In contrast to internet AI, which learns from static datasets (e.g., ImageNet which contains 2D images) to solve various tasks, embodied AI learns by interacting with a physical environment. Such data is not readily available on the internet, and expensive to acquire. The Qualcomm AI Research team has developed a novel data-efficient architecture model to improve robots’ perception of their environment. We call this architecture “Geometric Algebra Transformers” (GATr) — sign up for my webinar to learn more.

GATr considers geometric structures of the physical environment through geometric algebra representations and equivariance. It has the scalability and expressivity of transformers. Experiments show an impressive performance, even with little data. At its core, GATr is a general-purpose architecture for geometric data. It has three components: geometric algebra representations, equivariant layers and a transformer architecture. 

Geometric algebra representations

GATr uses a mathematical framework called geometric algebra to represent geometric data and perform computations on that data. By embedding different kinds of geometric data into a single geometric algebra, GATr can process a variety of geometric data types, making it suitable for a wide range of applications without requiring modifications to the network architecture.

Equivariant layers

The innovation we bring with equivariant neural networks is that no matter how you rotate or move the object, the generalized model will still identify the object. This is key for improving the data-efficiency of AI-powered robots. 

Graphic explaining that equivariant neural networks will enhance AI-powered robotics.
When we transform network inputs, the outputs transform consistently.
Transformer architecture

GATr is based on the transformer architecture, one of the most successful generative AI architectures. The fundamental operation in a transformer is called self-attention, for which we propose an equivariant alternative while preserving the excellent scalability properties of classical self-attention.

 

A line graph showing how GATr performs well compared to other methods.
GATr performs well even with little data.

GATr outperforms other state-of-the-art architectures

You can look at our process for generating path planning for a robot in a comparable way to generating an image with a diffusion model, except instead of denoising an image we now denoise a robot trajectory. Furthermore, we use GATr as the denoising network, rather than the more standard U-Nets.

We tested our method on several tasks, including robotic block stacking. In the graph above, our method outperforms all previous methods with 1% of the training data. As we scale the number of items, our method continues to outperform. GATr scales to tens of thousands of tokens, outperforming the geometric deep learning baselines.

 

GATr Data Collection

Sep 15, 2023 | 1:20

Video Player is loading.
Current Time 0:00
Duration 1:19
Loaded: 7.51%
Stream Type LIVE
Remaining Time 1:19
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected

Towards making embodied AI a reality

We believe that embodied AI will benefit society in manufacturing, healthcare and more. Our model architecture for data-efficient robot motion planning is just one of the embodied AI projects that the Qualcomm AI Research team is working on. I recommend you also check out our work in “Uncertainty-driven Affordance Discovery for Efficient Robotics Manipulation” for helping AI-powered robots make decisions. 

On-device generative AI will continue to play a fundamental role in embodied AI. Furthermore, we believe that equivariance allows for more efficient understanding of 3D images/videos with AI. Stay tuned for more research in this direction.

 

References

1: United Nations, World Population Prospects, 2017

2: Bloomberg, 2023

 

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

 

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Author
Taco Cohen
Taco CohenEngineer, Principal, Qualcomm Technologies Netherlands B.V.
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.