Qualcomm first started talking about scene-based audio for immersive sound over a year ago and have made significant progress since — and it is being seen as an ideal technology for immersive virtual reality (VR). As a quick reminder, scene-based audio uses Higher Order Ambisonics (HOA) to create a compact, high-fidelity representation of the entire audio scene, and then optimally recreates the entire audio scene as needed at the location of the playback. This means that the sound you hear is essentially true-to-life—it is high-fidelity, 3D sound from the correct directions — to enable highly immersive experiences.
Check out the video below for a quick overview of scene-based audio, and please wear headphones for the best binaural audio experience.
Scene-based audio progress and momentum
As an emerging technology, scene-based audio has several ongoing paths to market adoption. MPEG-H 3D Audio, which supports scene-based, was published as an International Standard in October 2015. MPEG-H is one of two candidates chosen for the audio codec of ATSC 3.0 (Advanced Television Systems Committee). ATSC 3.0 is the specification for next-generation digital television over-the-air transmission in several countries, including the U.S. and South Korea. As a sign of adoption, all South Korean broadcasters will implement MPEG-H 3D Audio as the country’s broadcast audio format. This is especially exciting since the 2018 Winter Olympics in Korea will be a showcase event that demonstrates the immersive and interactive capabilities of MPEG-H 3D Audio. In addition, to support broad device interoperability, MPEG-H is included in the DisplayPort and HDMI specifications.
Scene-based audio is also gaining traction outside of standards. For example, YouTube supports First Order Ambisonics (FOA) for spatial audio in 360o videos. More recently, YouTube launched spatial audio for on-demand YouTube videos and is accumulating content. While basic FOA already offers the benefit of 3D audio in a single format that adjusts to the speaker configuration of different devices, the evolution to HOA allows for better spatial precision and audio fidelity. The benefits of HOA are compelling, so it is only a matter of time before adoption reaches its tipping point.
NAB 2016 demos and news
At the National Association of Broadcasters Show (NAB Show) in April 2016, Qualcomm Technologies had many new exciting demos showcasing scene-based audio and MPEG-H 3D Audio. To show the true power of scene-based audio, we setup a simulated living room with 22 configurable speakers so we could create audio experiences all the way from simple mono to fully immersive setups (like 7.1.4 and 22.2).
We demonstrated the end-to-end TV broadcast chain with both live and pre-produced content. The key features in the broadcast chain that we showed were:
- Live scene-based audio capture and mixing using an AVID console
- Scene-based audio cinematic content mixed using our ProTools HOA plugin suite
- Real-time MPEG-H audio encoding, monitoring, and decoding
- Transport of HOA through a Serial Digital Interface (SDI) framework
For playback in our simulated living room, we demonstrated several key features enabled by scene-based audio. From a single audio file, flexible rendering dynamically generated the appropriate speaker feeds based on the number and location of the speakers. We also showcased several interactive features, such as rotating and rendering the sound field in sync with the 360o video viewed on TV, acoustically zooming into sections of the visual scene based on real-time user input, and selecting your preferred language commentary. With multiple bitrate compression, we also showed how we could reduce transmission bandwidth while still providing a high-quality 3D audio experience with HOA.
In addition to the living room speaker setup, we also demonstrated how scene-based audio is making VR experiences truly immersive. On a motion-tracking VR headset powered by a Qualcomm Snapdragon 820 processor, we created realistic binaural audio and 360o video experiences.
Scene-based audio is ideal for virtual reality
VR has created tremendous excitement in the industry, and realistic positional audio is one of the essential ingredients for creating truly immersive VR experiences. Positional audio enables the correct localization of sound from all directions and is made possible by dynamically adjusting the sound field based on the head position relative to the 3D virtual world. For example, if a plane is flying by, then the sound from the plane needs to be repositioned as both your head and the plane move. Scene-based audio offers many benefits for capturing and playing back positional audio.
For capture, scene-based audio inherently records the entire sound scene and aims to achieve as realistic a representation of the original sound-field as possible — and it does this in real time. This is especially important for live-broadcast events like sports, concerts, and user-generated content (such as a wedding) where you don’t have time for audio post-processing. Other 3D audio formats require post-processing to isolate and spatialize sound objects in the 3D audio scene, which is extremely difficult for these live events. Also, scene-based audio offers the flexibility to capture from a single microphone array, distributed microphones, or both, which allows for a simple capture setup (and capture devices). Additional audio inputs (also known as stems) can also be mixed into the scene-based audio format with a variety of new tools that are becoming available.
For VR playback, the audio scene is rotated to match the user’s head position, and then a binaural signal is rendered from the adjusted audio scene and played back to each ear. HOA coefficients are computationally efficient to rotate, stretch, or compress the audio scene. Thus, with scene-based audio it is easy to rotate and binauralize the audio scene irrespective of its complexity (by rotate, we are referring to yaw, pitch, and roll rotations in the image below).
We’re very excited to see what’s possible as scene-based audio continues to gain momentum. Scene-based audio is one of many technologies that make up the Qualcomm Aqstic™ portfolio. The Aqstic portfolio shows how QTI takes a holistic approach to providing the best audio quality, from capture and playback technology, like scene-based audio, to the Aqstic audio codec and smart power amplifiers. Want to learn more? Be sure to keep checking our scene-based audio website and white paper for updates. Also, sign up for our newsletter to receive the latest information about mobile computing.