Nov 5, 2018
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
AI’s potential is undeniable. In fact, we’re already seeing it invisibly integrated into our day-to-day lives. From things as commonplace as customized entertainment recommendations to fast, medical diagnoses and security systems in airports, AI is all around us. What if it could simplify and enrich our lives even further, so that we could have more time to relax, pursue hobbies, and spend time with the people we love?
This challenge is being tackled at the QUVA Lab established by Qualcomm Technologies and the University of Amsterdam. Championed by three of the university’s professors of machine learning — Arnold Smeulders, Cees Snoek, and Max Welling (now also a VP of Technology at Qualcomm Technologies Netherlands, B.V.) — and a Qualcomm Technologies’ VP of Technology Jeff Gehlhaar, the QUVA lab was launched in 2016 with the purpose of supporting world-class research in deep vision, which applies deep learning to computer vision. The lab's mission is in line with one of Qualcomm’s missions: enriching lives with technology — in this case, with devices that can help us better organize our visual memories, security cameras that evaluate threat levels, and cars that can navigate well on their own in unforeseen conditions.
Currently, the lab sponsors a combined 13 PhDs and postdoctoral researchers working in the field of deep vision. The lab boasts more than 25 publications, which have been accepted at top machine learning and AI conferences, including CVPR, ICLR, ICML, ECCV, and ICCV. In addition, the lab organizes one deep vision seminar per month with international speakers from top universities and companies. Anyone passionate about AI is welcome. Some highlights from this year’s research include: robust mobile tracking, personal event detection and recounting, and hyperparameter optimization.
Robust mobile tracking
Robust tracking of movement essentially means following a target object consistently throughout long videos. It can be challenging since the target object can be of any visual appearance and semantics, while the video may additionally suffer from low resolution, shaky capture, and/or additional poor quality. Many of these challenges can be quickly addressed by rethinking the very concept of tracking and approaching it from an instance search perspective, as presented in the QUVA SINT tracker detailed in the Siamese Instance Search for Tracking paper.
Traditionally, visual object trackers rely on handcrafted and brittle temporal mechanisms for updating the model of the target object over time. The basic idea of SINT is to discard any notion of time and such brittle temporal mechanisms, and instead use deep learning to visually match the target object throughout the video based on pixel similarity only. Thus, the SINT tracker avoids the need for noisy and unreliable updates, arriving at excellent results, especially for longer videos with which most trackers struggle.
There are many applications of robust mobile tracking. For instance, consider autonomous cars trying to make sense of their surroundings. They must track how pedestrians and other cars move, so that they can successfully navigate in various weather conditions and in a timely manner. Another exciting application of robust mobile tracking is improving household robots. Robots need to have a full and active understanding of their environment, distinguishing between a child, a kitchen counter, or the family’s new puppy. Without this information, navigation and manipulation of objects in the environment is challenging. Social robotics will depend on extremely accurate object tracking to be considered safe enough for use in people’s homes.
Personal event detection and recounting
You’ve probably noticed smart photo tagging and organizing features on social media. Facebook, for example, recognizes your friends in your uploaded photos, and Google Photos acts as an AI assistant, organizing your photos by content, location, and more. But an even more difficult challenge is accurately doing the same with videos. You may have a lot of videos with your dog, but say you’re looking to find the one in which you and your dog visited the beach for the first time. Soon AI will be able to help you search through your videos and organize them. Telling the machine which event is of interest is an open problem. Traditionally, you would have to first specify video examples that already contain an event of interest, but a more flexible and scalable solution is using language descriptions that describe the event and what it looks like, as explained in the Tracking by Natural Language Specification paper by QUVA lab.
However, the application of this technology is not limited to social media. Imagine the impact it could have on safety and security. A drone might help with watching over children in schools and playgrounds. A swarm of drones could have sophisticated surveillance capabilities, such as advanced activity-understanding algorithms, which minimize false alarms by understanding and recognizing activities of interest. For instance, you don’t want the drones to signal an emergency because they can’t differentiate between a game and a child in actual danger.
Consider this challenge: How do you improve a process in which you don’t know the internal mechanisms and can only observe what happens when the external parameter “knobs” are turned? This type of process is known as a “black box” process. Have you ever found yourself saying “I don’t know exactly how I did it, but it works”? It’s like that.
Traditional learning methods, like standard gradient-based optimization, can only optimize a process for accuracy when internal knowledge of the system is available, namely there is no black box assumption. Hyperparameter optimization, on the other hand, can optimize multi-factorial objectives or goals on a black box process. This is especially useful for large-scale, complex processes that rely on many independent and interconnected components, which can’t be easily modeled.
For example, hyperparameter optimization can be used to maximize the battery life of your smartphone by finding the optimal configuration of a deep learning network that maximizes for accuracy, speed, and power efficiency on a specific chip architecture. Traditional gradient-based learning would require an actual mathematical model of the network and the chip, which quite often isn’t possible. Another example is Uber, which already uses hyperparameter optimization to ensure a smooth user experience. With the emergence of the gig economy, many future services could be organized and orchestrated by machine learning on live data collected from IoT devices. We cannot use traditional methods for optimizing these processes since it would require domain knowledge that might not be possible to obtain.
QUVA lab has developed a unique and novel solution to the problem of hyperparameter optimization, as described in BOCK: Bayesian Optimization with Cylindrical Kernels. While most classical approaches struggle with black box problems with many parameter knobs, BOCK can gracefully scale up as it considers the optimization on warped geometrical landscapes, allowing for faster and more efficient travel between solutions in the search space.
Accelerating the path from research to commercialization
Qualcomm Technologies is working with QUVA lab toward a seamless collaboration in which the latest academic research finds its way into new and exciting products. The Qualcomm AI Research team and the lab team exchange ideas, findings, and data with the intent to commercialize these advancements at scale. Academic advancements in the field of machine learning are something to strive for, especially when they could ultimately make our lives safer, easier, and more fun.
Article written in collaboration with Efstratios Gavves, Scientific Manager of QUVA Lab and Assistant Professor at the University of Amsterdam.