Developer Blog

Driving photorealistic 3D avatars in real time with on-device 3D Gaussian splatting

Written by

Michel Sarkis

Dec 6, 2024

Our breakthrough research and optimizations are making possible immersive experiences with realistic 3D avatars at 60 FPS on phones and XR devices using 3D Gaussian splatting technology.

The engineering community thought that realistic 3D avatars rendered with 3D Gaussian splatting on a battery-operated device was too computationally expensive for the foreseeable future. At NeurIPS 2024, we showed that this is not true. 3D Gaussian splatting can run in real time on edge devices.

3D Gaussian splatting enables realistic digital twins

Gaussian splatting is an emerging technique for 3D representation due to the realism it adds. The original 3D Gaussian splatting (3DGS) captures some images, using COLMAP to align them, and estimates the splatting parameters using optimization [1].

Recently many papers [2,3] have released their Gaussian splatting avatar work that takes this representation to the next level. The important aspect is to train a 3DGS neural network to estimate the Gaussian splatting parameters of an avatar based on the expression vector as well as the ID of the avatar as conditions to the decoder. This idea [4] was extended with the ability to enroll an avatar from a commodity mobile device, which is a very compelling path to follow.

To represent an avatar, we assume that a retopologized mesh exists for skinning and tracking. The mesh is also assumed to be aligned with a UV map. In sense, each texel of the UV map is a container storing all the splat parameters as shown below.

Figure 1: Avatar UV map and corresponding mesh

Under this concept, the number of splats would correspond to the size of the UV map. For example, a 512×512 UV map would have 262,144 splats. This representation with a large number of splats would provide great quality but is problematic for edge devices since it also requires high compute and data bandwidth. How can we efficiently enable this concept to run Gaussian splatting on edge devices?

Our optimizations to 3D Gaussian splatting

To drive an avatar with facial expressions on device, we developed the flow below. We need a high-fidelity expression encoder to map the image to an expression vector, like blendshapes and gaze vectors. We choose these as they can easily be supported with standards like OpenXR [6]. The decoder would take the expression vector as well as the avatar assets to generate the splats.

To utilize the processing across the Snapdragon platform, we subdivide the computation into different blocks. We run the expression encoder as well as the avatar decoder on the Neural Processing Unit (NPU) of a device powered by Snapdragon. The 3DGS rendering runs on the Graphical Processing Unit (GPU). This way we benefit from the different processors running concurrently. The data flow from NPU to GPU can be managed with the shared memory concept. To reduce the data bandwidth between NPU and GPU, one can easily use existing concepts [7-9].

Figure 2: Comparison diagram of processing across Snapdragon platform

For the encoder and decoder to run on the NPU, we need to additionally make the AI model compatible with Qualcomm AI Engine direct SDK (e.g., quantized) [10]. To quantize while retaining the model accuracy, we use the AI Model Efficiency Toolkit (AIMET) [11]. As shown in the diagram below, one can first use any ML library to train the 3DGS decoder. Once quality is satisfactory, quantization-aware training (QAT) follows using AIMET [11] to generate a quantized model that can efficiently run on the NPU of an edge device powered by Snapdragon [10].

Figure 3: using any ML library to train 3DGS decoder

The world’s first demonstration of real-time 3D Gaussian splatting Avatar running on device

With the concept and optimizations just shared, we show in the image and profiling table below, how the overall system can run in real time at 60 FPS on edge devices powered by Snapdragon XR2 Gen 2 and Snapdragon 8 Elite. These numbers correspond to a 512×512 UV map.

Platform	Snapdragon XR2 Gen 2	Snapdragon 8 Elite
Encoder latency (ms)	3.905	1.196
Decoder latency (ms)	13.534	7.58
3DGS renderer latency (ms)	8.85	7.04

Figure 4: The world’s first demonstration of real-time 3D Gaussian splatting Avatar running on device

We also show a live video demonstration of the overall system running on a phone equipped with Snapdragon 8 Elite platform where a user can drive various avatars. The models utilized in this demonstration gave permission to Qualcomm Technologies to use their images and corresponding meshes for the purpose of the 3D avatar demonstration.

On-device 3D Gaussian splatting demo

Nov 26, 2024 | 2:18

Video Player is loading.

Current Time 0:00

Duration 2:18

Loaded: 4.34%

Stream Type LIVE

Remaining Time 2:18

What’s next?

Our intent is to make this research a commercial reality. We envision people having truly immersive conversations on XR devices where the lifelike facial avatars make it feel like everyone is in the same room even when you are countries apart.

Let us know what you think! Join our developer community on Developer Discord and sign up for our AI newsletter: What’s next in AI and computing

-------------------------------------------------------------------------

[1] B. Kerbl, G. Kopanas, T. Leimkühler, G. Drettakis, “3D Gaussian Splatting for Real-Time Radiance Field Rendering”, in SIGGRAPH, July 2023

[2] S. Saito, G. Schwartz, T. Simon, J. Li, G. Nam, “Relightable Gaussian Codec Avatars”, in CVPR, June 2024

[3] S. Giebenhain, T. Kirschstein, M. Rünz, L. Agapito, M. Nießner “NPGA: Neural Parametric Gaussian Avatars”, in SIGGRAPH ASIA, Dec. 2024

[4] J. Li, C. Cao, G. Schwartz, R. Khirodkar, C. Richardt, T Simon, Y. Sheikh, SA. Saito, “URAvatar: Universal Relightable Gaussian Codec Avatars”, in SIGGRAPH Asia, Dec. 2024

[5] B. Egger, W. Smith, A. Tewari, S. Wuhrer, M. Zollhoeffer, T. Beeler, F. Bernard, T. Bolkart, A. Kortylewski, S. Romdhani, C. Theobalt, V. Blanz, T. Vetter, “3D Morphable Face Models—Past, Present, and Future”, ACM Transactions on Graphics, vol. 39, no. 5, June 2020

[6] The OpenXR 1.1.42 Specification, https://registry.khronos.org/ OpenXR/specs/1.1/html/xrspec.html#XR_FB_face_tracking, last accessed, Nov. 2024

[7] M. Sarkis, W. Zia, K. Diepold, “Fast Depth Map Compression and Meshing with Compressed Tritree”, in ACCV, Nov. 2009

[8] M. G. Kim, S. Jeong, S. Park, J. Han, “Superpixel-guided Sampling for Compact 3D Gaussian Splatting”, in ACM Symposium on Virtual Reality Software and Technology, Oct. 2024

[9] J. C. Lee, D. Rho, X. Sun, J. H. Ko, E. Park, “Compact 3D Gaussian Representation for Radiance Field”, in CVPR, June 2024

[10] Qualcomm^® AI Engine Direct SDK , https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk, last accessed Nov. 2024

[11]Qualcomm^® AI Model Efficiency Toolkit (AIMET), https://quic.github.io/ aimet-pages/, last accessed, Nov. 2024

XR Compute Computer Vision

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. AIMET is a product of Qualcomm Innovation Center, Inc.

About the Author

Michel SarkisPrincipal Engineer/Manager