OnQ Blog

Double vision: The challenge of creating a world for both humans and robots

2016年6月29日

Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.

Carla Diana is a future-focused product designer, author, and educator, who tracks trends in emerging technologies through hands-on projects and experiments. She teaches in the integrated product design program at The University of Pennsylvania, and tweets as @carladiana_. The views expressed are the author’s own, and do not necessarily represent the views of Qualcomm

Alexa, Amazon’s voice-controlled home automation system, is great, except she wouldn’t play my favorite song, “Anna Ng” by They Might Be Giants. I tried saying it a dozen ways: “Anna Enge," "Anna Ennnng," “Anna! En! Ga!” Now, I just pronounce it the way Alexa wants to hear it: “Anna. N. G.” It’s just another item on the list of things that, thanks to the computers I talk to, now have mangled names. This happens because robots perceive only a miniscule subset of what we, as humans, do and say.  

There’s a disconnect between how we see and hear the world and how our artificially intelligent companions do. Wrapped up in this disconnect is a problem and a challenge: Until computer vision and machine learning catch up to our human understanding of the world, we run the risk of creating a ”robot readable world” that to us, as people, might appear muted, harsh, and stilted. 

Today, our devices and apps are like aliens, with whom we desperately want to communicate. After all, successful interaction means life can become easier; we get to hear the songs we want to, have speech automatically transcribed, or find directions to our destination without taking our hands off the steering wheel. So, for now, we’re letting the bots take over — just a little bit. We’ll pronounce words so they understand them and make specific gestures and facial expressions so that their cameras can read our cues. 

At first, these adjustments may not seem like a big deal, but if they’re continually happening, there’s a cost to our language and, ultimately, our culture. (After a few “Anna. En. Gee.” incidents, will that pronunciation be forever etched into the synapses of my brain?) We may even need to alter our speech to make it clear when we’re directly addressing robots. A recent NPR story about the Amazon Echo makes such a shift seem necessary; listeners’ Echoes began responding to phrases on the broadcast, controlling thermostats and speakers in their homes. 

Going forward, these adaptations won’t stop at speech, either. Interactive design researcher Wendy Ju has suggested that we might need to alter our gait and pedestrian patterns at intersections to accommodate the movements of autonomous cars.  

Eventually, robotic accommodations may extend beyond behavior and affect how our world actually looks. A motorcycle augmented-reality display developed by University of Pennsylvania researcher Matthew O’Kelly sees unnatural colors like neon red and green more easily than muted grays and blues. Similarly, autonomous vehicles that rely on infrared-based Lidar mapping technology need clear markers on the landscape to accurately locate themselves. And, both of these vision systems have trouble seeing black objects at a distance. Imagine: Future roads designed to accommodate these needs would need to be lined by garishly colored signs and sidewalks covered in Lidar-detectable clutter. Perhaps black cars will even fall out of favor. 

As fascinating as it is to envision this future, it's also concerning. Culture is a distinctly human thing, evolving organically over the ages and (in an ideal world) helping us improve through collective development and understanding. If we visualize cities built with robot readability in mind — with drone-friendly skylines and QR-code-laden intersections —  many of the artifacts, signs, and phrases that have meaning in our culture will be altered.

We’re already heading down this path, and we may hit a point where there may be no turning back. “Meaning is a one-way gate,” Tom Guarriello, Ph.D. and creator of the blog and podcast Robopsych: Exploring the Psychology of Human-Robot Interaction, recently explained to me. The idea that a robot-readable culture would be much more primitive and homogenous than what we have now is concerning. Accents won’t be discerned, street signs will be exaggerated, and clever turns of phrase will be all-but avoided. In essence, we will be accommodating the limitations of a much more narrow-minded and inexperienced culture.

For computer-vision engineers, pushing past the limitations of today’s robotics is an ongoing challenge, one that’s about much more than developing machines that can read codes and recognize abstract shapes. Through a combination of 3D cameras, processing algorithms, and extensive image databases, those at the forefront of computer vision have developed artificial intelligence that can see colors, read labels, and accurately identify and distinguish objects like cups and bottles. But there’s a long way to go before the systems can accurately process all the nuances of the visual and verbal world that we take for granted — for instance, the textural differences between the bark of an oak tree and that of a fir. 

In the meantime, software designers can provide a helpful stopgap by creating systems that learn through regular human interaction. For example, artificially intelligent assistants like Google Now could shift from cold, impersonal statements (“Approaching the grocery store. Here is your shopping list.”) to more conversational language (“It looks like you’re approaching the grocery store. Would you like to view your shopping list?”). Though this learning approach may require a little more input on the human side, an organically built history of interactions can inform the algorithm’s ongoing refinement. 

This evolutionary approach to product design counteracts the commonly-held notion that new technologies emerge in isolation — a shiny robot isn’t going to show up on your doorstep and change life forever. No innovation (robotic, artificially intelligent, or otherwise) can exist in a vacuum, free from the context of culture. For this reason, we need interfaces and technologies that can learn and embrace the artifacts of what has come before them. After all, who wants a robot that doesn’t get your BB-8 jokes?