Addressee Estimation is the ability to understand to whom a person is directing an utterance. This ability is crucial for social robot engaging in multi-party interaction to understand the basic dynamics of social communication. In this project, we deploy a DL model on an iCub robot to estimate the placement of the addressee from the robot's first person perspective by taking as input visual information of the speaker. Specifically, we extract two visual features: body pose vectors and face images of the speaker from the camera stream of the iCub and use them to feed our model. The model classify the addressee's placement as 'robot', 'left', or 'right, meaning respectfully that the addressee is the robot, or is at the robot's left or right.

This project does not have a wiki homepage yet