Qualcomm Augmented Reality Lecture Series

1 / 1
Austria
1 / 1

Ground and Aerial Robots for Challenging Environments

Date: April 21, 2016, 11:00-12:30

Speaker: Roland Siegwart
Autonomous Systems Lab & Wyss Zurich

Location: TU Vienna
Zemanek Lecture Room (Room Number: HHEG01)
1040 Vienna, Favoritenstraße 9-11, Stiege III, ground floor, light green area

Abstract:
Disaster response operations or industrial inspections are among the most rewarding but also most challenging tasks for autonomous mobile robot. While robots are already doing a wonderful job as factory work-horses or floor cleaning devices, operations in highly unstructured and unknown environments, which are typically encountered after disasters, in mines or on offshore oil rigs are still a major challenge.
Within this talk, our latest research results in legged and flying robots systems, designed to operate in complex environments, are presented and discussed.

Our electrically powered legged quadruped robots are designed for high agility, efficiency and robustness in rough terrain. This is realized through an optimal exploitation of the natural dynamics and serial elastic actuation. Equipped with laser scanners and cameras, our quadruped StarlETH and AnyMal are able to autonomously find their path through rough terrain, climb stairs and build a 3D map of their environment.

For fast inspection of complex environments, flying robots are probably the most efficient and versatile devices. However, the limited payload and computing power of multi-copters renders autonomous navigation quite challenging. Thanks to our custom designed visual-inertial sensor, real-time on-board localization, mapping and planning has become feasible and enables our multi-copters for advanced rescue and inspection tasks, even in GPS-denied environments.

Overcoming the limited power autonomy and flight range of multi-copters is the main focus of our research in unmanned solar airplanes, omnidirectional blimps and hybrid systesm. Our most recent design of a fixed wing solar airplane with 5.6 m wing span allows for unlimited flight durations, thus enabling search and rescue from the air over large environments. Thanks to on-board visual sensing, these solar airplanes are also capable to fly very close to ground and plan their path around obstacles.

Bio:
Roland Siegwart (born in 1959) is professor for autonomous mobile robots at ETH Zurich, founding co-director of the Wyss Translational Center Zurich and member of the board of directors of multiple high tech companies. He studied mechanical engineering at ETH, brought up a spin-off company, spent ten years as professor at EPF Lausanne (1996 – 2006), was vice president of ETH Zurich (2010 -2014) and held visiting positions at Stanford University and NASA Ames.

He is and was the coordinator of multiple European projects and co-founder of half a dozen spin-off companies. He is IEEE Fellow, recipient of the IEEE RAS Inaba Technical Award and officer of the International Federation of Robotics Research (IFRR). He is in the editorial board of multiple journals in robotics and was a general chair of several conferences in robotics including IROS 2002, AIM 2007, FSR 2007 and ISRR 2009. His interests are in the design and navigation of wheeled, walking and flying robots operating in complex and highly dynamical environments.

Transfer Learning for Visual Scene Understanding

Date: June 16, 2016 10:30-12:00

Speaker: Christoph Lampert
Institute of Science and Technology Austria (IST Austria)

Location: TU Vienna
Zemanek Lecture Room (Room Number: HHEG01)
1040 Vienna, Favoritenstraße 9-11, Stiege III, ground floor, light green area

Abstract:
It is a long-lasting dream of computer vision research to build an automatic system that is able to understand natural images on a similar semantic level as a human would. It is clear, however, that our current techniques will not sufficient to achieve this goal, since they need a lot of manually annotated training data for every task they try to solve. In my talk, I will highlight two recent results from our group that rely on transfer learning to overcome this limitation: multi-task learning with active task selection, and weakly-supervised semantic image segmentation.

Bio:
Christoph Lampert is a professor for Computer Vision and Machine Learning at the Institute of Science and Technology Austria (IST Austria). He received a PhD degree in mathematics from the University of Bonn in 2003. Subsequently, he held postdoctoral positions at the German Research Center for Artificial Intelligence and the Max-Planck for Biological Cybernetics before joining IST in 2010.

His research on computer vision and machine learning has won several international and national awards, including best paper prizes at CVPR and ECCV in 2008. In 2012 he was awarded an ERC Starting Grant by the European Research Council. He is an Editor of the International Journal of Computer Vision (IJCV), Action Editor of the Journal for Machine Learning Research (JMLR), and Associate Editor in Chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

Download the presentation slides.

Ground and Aerial Robots for Challenging Environments

Date: 21-04-16

Speaker: Roland Siegwart
Autonomous Systems Lab & Wyss Zurich www.asl.ethz.ch & www.wysszurich.ch

Abstract: Disaster response operations or industrial inspections are among the most rewarding but also most challenging tasks for autonomous mobile robot. While robots are already doing a wonderful job as factory work-horses or floor cleaning devices, operations in highly unstructured and unknown environments, which are typically encountered after disasters, in mines or on offshore oil rigs are still a major challenge.
Within this talk, our latest research results in legged and flying robots systems, designed to operate in complex environments, are presented and discussed.
Our electrically powered legged quadruped robots are designed for high agility, efficiency and robustness in rough terrain. This is realized through an optimal exploitation of the natural dynamics and serial elastic actuation. Equipped with laser scanners and cameras, our quadruped StarlETH and AnyMal are able to autonomously find their path through rough terrain, climb stairs and build a 3D map of their environment.
For fast inspection of complex environments, flying robots are probably the most efficient and versatile devices. However, the limited payload and computing power of multi-copters renders autonomous navigation quite challenging. Thanks to our custom designed visual-inertial sensor, real-time on-board localization, mapping and planning has become feasible and enables our multi-copters for advanced rescue and inspection tasks, even in GPS-denied environments.
Overcoming the limited power autonomy and flight range of multi-copters is the main focus of our research in unmanned solar airplanes, omnidirectional blimps and hybrid systesm. Our most recent design of a fixed wing solar airplane with 5.6 m wing span allows for unlimited flight durations, thus enabling search and rescue from the air over large environments. Thanks to on-board visual sensing, these solar airplanes are also capable to fly very close to ground and plan their path around obstacles.

Download the presentation slides.

Visual-Inertial State Estimation for Resource Constrained Mobile Applications

Date: 28-1-16
Speaker: Stephan Weiss, Alpen-Adria University (AAU)

Abstract: Visual-inertial state estimation has significantly gained on importance in both research and industry in the last few years. With the advent of powerful computation units, even complex approaches are now capable of running on-board mobile devices. However, the latency and computational complexity of these algorithms is still an issue for closed loop control of highly resource constraint and agile mobile robots. Moreover, while terrestrial technology provides high-end platforms for such algorithms, space systems still lack computational power.

This talk will discuss a visual-inertial state estimation framework that has very low computational complexity but still has the ability of system self-calibration and, maybe more important, system self-healing upon sensor drop-out or algorithm failure. The framework seamlessly fuses visual odometry based position control with velocity control form inertial and optical flow cues to obtain a fast deployable platform which is robust against otherwise critical events. We will also shed light to the drawbacks arising from the low complexity requirement and we will see how this algorithm is supposed to navigate a helicopter on Mars as proposed by NASA-JPL.

Mid and high-level features for dense monocular SLAM

Date: 19-11-15
Speaker: Javier Civera

Abstract: Visual SLAM has become one of the key technologies in an increasing number of relevant applications (e.g., Augmented/Virtual Reality, Robotics and Autonomous Cars), where the estimated camera motion and map can be used for coherent virtual insertions, collision-free autonomous navigation or high-level robotic tasks.

The traditional focus of visual SLAM has been the geometric model of point-based features; and it is only very recently that a more semantic understanding and the modelling of scene priors has gained relevance in the community. Learning and understanding patterns beyond a local geometric/photometric scope opens the door to a wider array of applications and improves the performance of the traditional methods. In this talk I will review and give details on some of the most relevant papers and my own work in this direction, highlighting the improvements and current limitations of mid-level and high-level scene features in the visual SLAM domain.

Download the presentation slides.

Vision-controlled Micro Flying Robots: from Frame-based to Event-based Vision

Date: 9-10-15
Speaker: Davide Scaramuzza, University of Zurich

Abstract: Autonomous quadrotors will soon play a major role in search-and-rescue and remote-inspection missions, where a fast response is crucial. Quadrotors have the potential to navigate quickly through unstructured environments, enter and exit buildings through narrow gaps, and fly through collapsed buildings. However, their speed and maneuverability are still far from those of birds. Indeed, agile navigation through unknown, indoor environments poses a number of challenges for robotics research in terms of perception, state estimation, planning, and control. In this talk, I will give an overview of my research activities on visual inertial navigation of quadrotors, from slow navigation (using standard frame-based cameras) to agile flight (using event-based cameras). Topics covered will be: visual-inertial state estimation (SVO, SVO+GTSAM), absolute scale determination, monocular dense reconstruction (REMODE), active vision and control, event-based vision View slides here

Download the presentation slides.

ORB-SLAM: A Real-Time Accurate Monocular SLAM System

Date: 11-6-15
Speaker: Juan Domingo Tardós Solano, Universidad de Zaragoza

Abstract: In this talk, I will present ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments, being robust to severe motion clutter, with the capability of wide baseline loop closing and relocalisation, and including full automatic initialisation. Our work builds on excellent algorithms of the last years to design from scratch a novel system that uses the same features for all the SLAM tasks: tracking, mapping, relocalisation, and loop closing. Our survival of the fittest strategy to select the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, enhancing lifelong operation. We have performed exhaustive evaluations in the most popular datasets achieving unprecedented performance. To the benefit of the community we have published the source code.

Download the presentation slides.

Delineating Curvilinear Networks using Path Classifiers and Integer Programming

Date: 21-4-15
Speaker: Pascal Fua, EPFL IC CVLAB, Lausanne, Switzerland

Abstract: We propose a novel Bayesian approach to automated delineation of curvilinear structures that form complex and potentially loopy networks. By representing the image data as a graph of potential paths, we first show how to weight these paths using discriminatively-trained classifiers that are both robust and generic enough to be applied to very different imaging modalities. We then present an Integer Programming approach to finding the optimal subset of paths, subject to structural and topological constraints that eliminate implausible solutions. Unlike earlier approaches that assume a tree topology for the networks, ours explicitly models the fact that the networks may contain loops, and can reconstruct both cyclic and acyclic ones. We demonstrate the effectiveness of our approach on a variety of challenging datasets including aerial images of road networks and micrographs of neural arbors, and show that it outperforms state-of-the-art techniques.

Download presentation slides.

Working to Solve the Haptics Problem in VR

Date: 9-2-15
Speaker: Dr. Robert W. Lindeman, Worcester Polytechnic Institute, Massachusetts, USA

Abstract: Much of the work done in virtual and augmented reality has focused on the visual sense. This is no surprise, given how visually dominant humans are, how well we understand the human visual system and how to stimulate it, and the ease with which we can create technology to do so. However, limiting our efforts to one of the five (six?) senses seems like a waste, since the way we experience the real world involves the use of all the senses. This talk will describe ways in which we can enhance mixed reality experiences using the haptic (touch) sense. Haptics is thought to be the most difficult sense to stimulate for several reasons: So many sensations are categorized within it. The haptic "sensors" in the body cannot be uniformly stimulated as with vision, sound, taste, or smell. More than any other sense, it has a very tight coupling between action and response. Display hardware is very cumbersome, special purpose, and primitive. Still, there is some hope for using haptics to improve the user experience. In this talk, I discuss the ways in which we have explored this topic, and show some of the baby steps we have taken to deliver on the promise of effective multi-sensory immersive experiences.

Visual SLAM with an event-based camera

Date: 27-1-15
Speaker: Hanme Kim, Imperial College London

Abstract: An event camera is a silicon retina which outputs not a sequence of video frames like a standard camera, but a stream of asynchronous spikes, each with pixel location, sign and precise timing, indicating when individual pixels record a threshold log intensity change. By encoding only image change, it offers the potential to transmit the information in a standard video but at vastly reduced bitrate, and with huge added advantages of very high dynamic range and temporal resolution. However, event data calls for new algorithms, and in particular we believe that algorithms which incrementally estimate global scene models are best placed to take full advantages of its properties. We recently showed for the first time that an event stream, with no additional sensing, can be used to track accurate camera rotation while building a persistent and high quality mosaic of a scene which is super-resolution accurate and has high dynamic range. The method involves parallel camera rotation tracking and template reconstruction from estimated gradients, both operating on an event-by-event basis and based on probabilistic filtering. This talk will give an overview over our BMVC 2014 best industry paper, simultaneous mosaicing and tracking with an event camera, the key ideas and techniques behind it, and some current and future extensions we are working on.

Download presentation slides.

Computational Neuroscience for Technology: Event-based Vision Sensors and Information Processing

Date: 28-11-14
Speaker: Jörg Conradt, University of Technology Munich

Abstract: In recent decades the field of Computer Vision has developed sophisticated algorithms for problems such as object tracking or motion extraction based on sequences of high-resolution camera images. Applying such algorithms in real-time robotics reveals an inherent problem: they typically require high data bandwidth and high processing power, which results in substantial computing machinery and/or delayed processing of data. As part of an optimized biological solution for vision, our brain developed retinal receptor cells that largely respond with asynchronous events (“neural spikes”) to temporal changes of brightness. Such encoding of visual information substantially reduces the amount of transmitted data and simultaneously increases temporal precision.

Recently “silicon retinas” have been developed as specialized vision sensors to provide such neuro-inspired vision input for technical systems. Visual information obtained from these sensors differs substantially from traditional sequences of images, which requires an “event-based” redesign of computer vision algorithms. This talk introduces the neuro-inspired vision sensors and presents event-based algorithms for applications such as real-time computation of optic flow and visual object tracking at high update rates on minimalistic computing hardware.

Download presentation slides.

Sensory Expansion: Exploring Human Sensory Potential in 3D Interaction

Date: 23-10-14
Speaker: Dr. Ernst Kruijff, Bonn-Rhein-Sieg University

Abstract: The human body has an enormous potential to process different kinds of stimuli, yet, traditionally only part of this potential is used in interactive systems. With the proliferation of alternative 3D input and output techniques this has slightly changed.  For example, unconventional interfaces have appeared for the Wii and Xbox that are driven by the need to offer a great experience to the user, stimulating the senses through novel ways.

It is exactly the extended stimulation of senses this talk will be about. I will explore several innovative 3D interface projects that expand the senses by offering sensory rich feedback. Projects include the usage of a wide field of view AR visual display, flexible tactile surfaces for mobile devices, an audio-tactile glove for 3D manipulation, and a complex immersive multisensory installation. For all these interfaces, psycho-physiological foundations will be provided, next to specific design steps and validation results. Doing so, I will tackle the human potential driven design considerations underneath, and illuminate experienced approach boundaries and lessons learned.

Download presentation slides.

Making the Virtual Real: Physical Manifestations of Dynamic Virtual People, Objects, and Places

Date: 15-09-14
Speaker: Greg Welch, University of Central Florida

Abstract: The “Virtual Reality” and “Augmented Reality” we know today comprise a primarily visual paradigm—people, objects, and places are simulated by rendered computer graphics appearing on head-worn, hand-held, or fixed displays. However by combining physical shapes, robotics, and computer graphics we can, in some cases, “break free” of these displays, bringing the virtual objects into the real world around us. In this talk I will share some ideas and early results aimed at achieving this via a paradigm I like to call Physical-Virtual Reality.

How do we measure depth perception in near-field augmented reality inspired by medical applications?

Date: 25-06-14
Speaker: Edward Swan, Mississippi State University, USA

Abstract: The perception of depth is an important aspect of augmented reality (AR)—many AR applications have an important depth component, and AR displays do not display the depth of virtual objects with the same fidelity as real objects. Furthermore, the unique AR capability of x-ray vision, which affords many very compelling and interesting applications, has no perceptual equivalent in the real world.

In this talk, I will discuss how we might measure AR depth perception at near-field distances; that is, distances where objects may be touched and manipulated by the hands. I will further discuss how vision scientists have measured depth perception historically, and I will talk about the small number of previous experiments that have examined near-field AR depth perception. I will then cover a number of near-field AR depth perception experiments that have been conducted in my laboratory. To date, these experiments show that a mismatch between focal depth and stereo disparity, which many AR display systems exhibit, causes systematic depth judgment errors. I will also discuss effects of perceptual feedback and perceptual learning, which to date have barely been studied in AR, but which are likely very important for compelling near-field AR applications such as medical and industrial tasks, where users are highly skilled and highly practiced.

Download presentation slides.

Learning Better Image Features

Date: 24-04-14
Speaker: Vincent Lepetit, TU Graz, Austria

Abstract: Extracting and describing image features such as feature points or edges is a critical step of many Computer Vision systems. However in practice this is often done with carefully handcrafted methods. In this talk, I will present Machine Learning techniques that can detect and describing feature points and edges, and that are more robust that the best handcrafted methods.

Download presentation slides.

Shape Knowledge in Segmentation and Tracking

Date: 25-03-14
Speaker: Victor Adrian Prisacariu, University of Oxford

Abstract: In this talk I will detail methods for simultaneous 2D/3D segmentation, tracking and reconstruction which incorporate high level shape information.

I base my work on the assumption that the space of possible 2D object shapes can be either generated by projecting down known rigid 3D shapes or learned from 2D shape examples. I minimize the discrimination between statistical foreground and background appearance models with respect to the parameters governing the shape generative process (the 6 degree-of-freedom 3D pose of the 3D shape or the parameters of the learned space). The foreground region is delineated by the zero level set of a signed distance function, and I define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities. I obtain the differentials of this energy with respect to the parameters governing shape and conduct searches for the correct shape using standard non-linear minimization techniques.

This methodology first leads to a novel rigid 3D object tracker. For a known 3D shape, the optimization here aims to find the 3D pose that leads to the 2D projection that best segments a given image. I also show how the approach could be accelerated to a point where real time processing on a mobile phone becomes possible.

Next, I explore deformable 2D/3D object tracking. I use a non-linear and probabilistic dimensionality reduction, called Gaussian Process Latent Variable Models, to learn spaces of shape. Segmentation becomes a minimization of an image-driven energy function in the learned space. I can represent both 2D and 3D shapes which I compress with Fourier-based transforms, to keep inference tractable. I extend this method by learning joint shape-parameter spaces, which, novel to the literature, enable simultaneous segmentation and generic parameter recovery. These can describe anything from 3D articulated pose to eye gaze.

Designing, Implementing and Evaluating Wide-view Optical See-through Head Mounted Displays

Date: 24-02-14
Speaker: Kiyoshi Kiyokawa, Associate Professor, Cybermedia Center, Osaka University

Abstract: In this talk, I will introduce the design of a wide field-of-view optical see-through head mounted display we have studied as well as a series of user studies on its effectiveness in augmented reality. The development of a wide field-of-view optical see-through head mounted display (HMD) has been a technological challenge for decades. There exist wide-view closed-type HMDs using multiple display units (tiling) or multiple curved mirrors. The former approach tends to be expensive and heavy, whereas the latter approach tends to suffer from image distortion and a small exit pupil. In order to provide a wide-view image with a large exit pupil, I will first introduce design considerations and implementation of our head mounted projective display (HMPD) using a hyperboloidal half-silvered mirror. This, hyperboloidal HMPD or HHMPD successfully provides a wide field-of-view, a large observational pupil, and optical see-through capability. Next I will introduce design considerations and prototyping of a simple head attached screen that is both retro-reflective and semi-transparent. This screen allows the HHMPD to be used in a wearable situation. I will then introduce a series of user studies that assessed different aspects of subjective image quality of the wearable HHMPD, such as visual acuity, perceptual distance and recognizability of information presented in the peripheral vision. I will also introduce user studies on the effectiveness of wide-view see-through HMDs in a simulated outdoor environment using a CAVE-like IPT display that studied relationship between information discovery rates and the field-of-view with a set of different annotation rendering policies. I will also introduce another user study on the effectiveness of the HHMPD that has actually been conducted in outdoor. Finally, I will summarize the talk and give future directions and visions.

Convex variational methods for computer vision and image processing

Date: 29-11-29
Speaker: Dr. Thomas Pock, Institute for Computer Graphics and Vision, Graz University of Technology

Abstract: Energy minimization methods are among the most successful approaches to solve problems in computer vision, image processing and machine learning. Unfortunately, many interesting problems lead to non-smooth and in particular to non-convex optimization problems. In this talk I will discuss different strategies to tackle non-convex problems by convex methods, leading to very efficient and in some cases globally optimal algorithms.

Dense 3D scene flow estimation for locally rigid scenes

Date: 13-10-16
Speaker: Dr. Konrad Schindler, ETH, Zurich

Abstract: I will present recent work on 3D scene flow estimation, i.e. dense reconstruction of 3D geometry and motion field from two or more views. Scene flow estimation is a combination of the dense stereo and dense optical flow problems, and like these two it is ill-posed and needs some form of regularization. In our work we exploit the fact that image motion is frequently dominated by observer motion and a number of rigidly moving objects. I will talk about two different models derived from this assumption: the first one encourages rigidity in local neighborhoods; the second one explicitly models the scene as piecewise planar and rigid.

Download the presentation slides.

Seeing Through Augmented Reality

Date: 13-8-22
Speaker: Prof. Steven Feiner, Columbia University

Abstract: Researchers have been actively exploring Augmented Reality (AR) for over forty years, first in the lab and later in the streets. In this talk, I will share my thoughts about where AR is now, and where it might be headed in the future. I will illustrate the talk with examples from work being done by Columbia’s Computer Graphics and User Interfaces Lab. One project assists users in performing complex physical tasks, such as equipment maintenance and repair. Another supports rapid vicarious travel around a physical environment, to accomplish tasks in which users must switch among different viewing perspectives.

Download the presentation slides

AR and VR everywhere?

Date: 13-06-21
Speaker: Prof. Tobias Höllerer, University of California, Santa Barbara

Abstract: AR and VR hold enormous promises as paradigm-shifting ubiquitous technologies. We all have seen indications of this potential, but clearly most of us are still far from using AR or VR on a regular basis, everywhere. What will it take to get closer to mass adoption? This talk will present some of the work of UCSB's Four Eyes Laboratory in the areas of mobile Augmented Reality and Mixed Reality Simulation. It discusses existing success stories and ongoing limitations of AR and VR technologies on the roadmap to seamless interaction with the physical world. New developments in sensors and real-time computer vision, coupled with near-to-universal connectivity may make it possible to finally scale the user interface experience from small screens to the wider context of the world before us. And on the way there, somebody will surely want your data...

Download the presentation slides.

Interaction Technology, Interfaces and The End of Reality

Date: 13-05-14
Speaker: Alex Olwal, Ph.D., MIT Media Lab, Cambridge, United States

Abstract: The 21st century has introduced an array of technologies that strive to enable the seamless mix of virtual content and real-world environments. The ability to augment the human senses holds great potential, but could there be an inherent conflict with our natural experiences of our surroundings?

Emerging head-worn displays, such as Google Glass, promise to simplify and amplify our experiences of the real world, but are we also risking to diminish and reduce the value of an unaltered reality?

The first part of this talk will provide an overview of our technical research in merging virtual information with a real environment using novel displays systems, sensing technologies, interaction techniques and contextual user interfaces.

The second part will in a broader sense explore some potential unexpected implications of AR, such as intentionally impractical AR experiences, cognitive overload, addictive telepresence, and the fine line between augmentation and dehumanization.

Download the presentation slides.