Oct 6, 2020
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Back in 2014, Qualcomm Technologies acquired a spin-off startup of the University of Amsterdam called Euvision Technologies. This was a visionary R&D bet for Qualcomm. Euvision was conducting machine learning research for computer vision, which was a relatively new topic for Qualcomm at the time. The topic of machine learning was not making the mainstream headlines very often. For context, it would take another year for Amazon to launch Alexa and another two years before Uber started testing driverless cars. Qualcomm realized that advanced algorithms for computer vision and machine learning would need to run on power-efficient hardware soon. Qualcomm teamed up with the University of Amsterdam (UvA) to launch a joint deep vision research lab, which they called QUVA lab.
The lab, led by professor Arnold Smeulders, professor Max Welling, professor Cees Snoek, and assistant professor Efstratios Gavves was given autonomy to develop its own work. This method of research matched Qualcomm Technologies’ deep commitment to R&D and was a way of supporting high-quality scientific research, without the constraints of short-term business goals. The bet paid off. QUVA lab went on to publish papers at prestigious conferences, such as NeurIPS, ICML, CVPR and ICCV, and produced novel intellectual property, often in collaboration with Qualcomm AI Research — an established effort at Qualcomm Technologies with researchers based around the world. QUVA lab’s focus was initially at the intersection between computer vision and deep learning. Over time, it has extended to even more topics surrounding fundamental machine learning such as privacy-preserving deep learning, distributed deep learning, and equivariant deep learning algorithms.
For 2020, Qualcomm Technologies is proudly renewing our commitment to the QUVA lab for another five years, with a lot of exciting research plans ahead. We sat down virtually with professor Arnold Smeulders (AS), professor Cees Snoek (CS), and professor Max Welling (MW), a VP of Technology at Qualcomm Technologies Netherlands B.V. and a professor at UvA, to reflect on the last five years in deep vision as well as what the future of AI holds.
How has the perception of computer vision/deep vision changed over the past five years, since the lab started?
Arnold Smeulders: Computer vision in the nineties was very much occupied with solving the general segmentation problem. That is because segmentation is the key notion that humans use for visual recognition. "A cow is an object with four legs.” In fact, this is wrong most of the time, as pictures of cows often show only two legs, and it is an irrelevant explanation as it does not define a leg. It is also ineffective as it generates many exceptions: there are many things with four legs. So, this design-component, rule-based approach has very limited use for vision. And slowly it was discovered that segmentation is not really needed in many computer vision tasks. Recognizing a cow relies on identifying the animal but equally so on the green grass it may be standing on. To grasp an object, you only need to identify where the outside borders are, and so on. Effectively, the systems of 10 years ago were already stacked classification algorithms. The famous paper by Krishevski et al came out in 2012, and from that point on it was clear that neural networks are so much better at computer vision than component and rule-based algorithmic imitations of the human explanation of images. Neural networks are homogeneous; hence they allow theory at a much higher and fundamental scale to optimize performance than component by component needed for the earlier algorithmic models. QUVA originated shortly after the birth of neural networks in computer vision.
Cees Snoek: When we started the QUVA lab in 2015, deep learning was already making a profound impact on computer vision, but there were still alternative technologies. What has surprised most of us was the speed with which deep learning progress continued and the research momentum it would generate in both academia and industry. Nowadays, it is almost impossible to publish a computer vision paper without a deep learning connection.
Max Welling: The whole field is now completely dominated by deep learning, which is ideally suited for image data. The field has rapidly developed and can analyze video sequences in real time for self-driving cars, it can generate photorealistic images of faces that are indistinguishable from real faces, and much more.
Can you give examples of research projects that you’re proud of, or that you learned a lot from?
AS: In the early years Rein van den Boomgaard derived a relationship between min-max filters and Burger’s partial differential equation. I was very proud to connect these extreme ends of mathematical descriptions of an image field. Much later, Jasper Uijlings had the flash of inspiration to define bottom-up grouping in an image on simple self-orderings in the image. Cees and I won the TREC-VID competition together for an entire decade through hard work. Other than that, any time a PhD student surprises themselves with a finding that was previously inconceivable, I am proud.
CS: I am proud of all my research works and only by learning from failure can we make real progress. One example in the context of the QUVA lab that made me proud is the work on VideoLSTM. First and foremost, because of the deep vision contributions it brings to the LSTM architecture for activity recognition and localization in video. Second, because it was realized by a collaboration with students from the QUVA lab and researchers from Qualcomm AI Research Amsterdam. And last but not least, because it generated three patent applications.
MW: Perhaps two examples of things I’m proud of: we figured out how to perform backpropagation in a spiking neural network, which was a stumbling block before. And second, we designed an algorithm that learns to perform efficient search in combinatorial spaces, which we are now using to improve physical designs for chips. On the failed side, I had high hopes to develop a privacy-preserving deep learning architecture, but the method ended up not being very competitive relative to what was already developed.
The pandemic has seen a speedup in ML research applied to fields such as digital health, contact tracing, etc. What do you think the impact will be (if any) on the research of QUVA lab?
CS: The mission of the QUVA lab is to perform world-class research on deep vision. In science, it is important to focus and not be too distracted by the issues of the day. It is not to underestimate the impact and importance of COVID-19, of course, but when one focuses on fundamental problems the applications will come naturally. For digital health in the context of COVID-19, the QUVA lab’s work on federated learning has direct utility for contact tracing and our work on activity recognition allows for monitoring social distancing behavior, to name just two examples.
What do you see as being the most promising trends in deep vision now?
AS: The most promising trend is the integration of existing knowledge on images, purposes, and image formation principles with the universal principles steering the current development of neural networks. So, to find the right balance between generalization and differentiation and apply that to the purpose and circumstance is the greatest challenge. We are still in the early days of neural network architectures in computer vision.
CS: Today’s deep vision thrives on well-defined classification problems where one learns to associate labels with pixels using huge amounts of compute. For me, the most promising trends challenge this status quo by moving from learning to associate, into predicting the cause of an effect, while reducing the need for label-supervision by means of physical simulations, generative models or self-supervision, and doing so in a compute-efficient fashion.
MW: I will answer to ML. I believe that two trends are very interesting: the use of reinforcement learning and Bayesian optimization for combinatorial optimization problems and the use of quantum mechanics and quantum computation for machine learning.
Congrats, QUVA lab was just extended for five years. What’s the lab’s vision for the coming years?
CS & AS: Thank you, congratulations to Qualcomm as well! The lab’s vision builds on the trends and challenges identified above, with a strong emphasis on deep machine learning foundations, covering topics like continuous video learning, unsupervised video compression, and self-supervised activity recognition. Vacancies for the lab will open soon, and I hope that many talented students will apply to help us realize the vision.
MW: I really look forward to more intense collaborations and new topics such as the ones I described above.