Development of perceptual systems for multimodal human-robot interaction
Abstract
One of the main challenges that society will face in the coming years is the provision of assistance for people with dependence, more specifically, elderly people that might find it difficult to move or to perform daily chores autonomously. A similar scenario is expected for people with disabilities. This long-term care has usually been provided by family members, but recent demographic and social changes motivate its provision by professionals and institutions.
At the same time, we live in a technological era, where artificial intelligence and robotics will have a major role to play in the welfare state. Recent developments in these fields have resulted in the natural incorporation of robotic solutions to the daily life of most societies.
It is only logical to propose the use of robots to provide assistance for people with dependence. In this case, one of the most interesting solutions is social robots, that is, robots that are able to communicate with people in a natural way and following social conventions. This interaction between people and robots is known as human-robot interaction (HRI), which is a multi-disciplinary research field with many open problems. HRI applications must deal with classic robotic tasks, like navigation or mapping, but also with tasks based on the use of machine learning techniques, like computer vision or natural language processing.
A social robot must be capable of analyzing its environment by capturing data through its sensors, and then translate it into knowledge that can be further exploited for HRI. That is, perceptual systems in a robotic platform are fundamental to extract semantic information. Images captured by a camera or RGB-D sensor can be used to identify the place where the robot is located or to detect nearby objects. Meanwhile, speech understanding for conversation can also become a source of valuable information about the context where the robot is interacting.
The main goal of this dissertation is the design, development and eval uation of different perceptual systems that enable a natural HRI. In particular, we will study different state-of-the-art algorithms to assess their suitability for robotic applications, where resources consumption and time limits are especially relevant, and propose improved or novel solutions to ensure their applicability in robotics. Consequently, we will develop multimodal systems (that are able to manage information from different sources, in this case, visual and audio inputs) to extract and merge semantic information.
To validate our different proposals, we have designed two use cases: Welcoming visitors and Catering for Granny Annie's Comfort, which correspond to some of the tasks that are usually evaluated in competitions for service and assistance robots, for example, RoCKIn@Home or Robocup@Home. These two use cases are useful to evaluate the capability of a robot for adapting to different situations with different people, and for correctly analyzing the environment.
Ultimately, the promising results of our evaluations show that the different methods proposed in this dissertation put us one step closer to our initial goal, that is, to de can provide long-term care for people with dependency.