Viewing television (TV) with family and friends makes the experience more enjoyable. Recently, as more people watch TV programs on their mobile devices via the internet, opportunities to watch TV in groups are decreasing. We propose a companion robot to enhance human communication during the TV-viewing experience. The robot extracts keywords from the video, audio, and subtitle data of the TV program being watched, generates utterances from the keywords, and asks people questions related to the TV program. Viewers can chat with the robot, triggered by its questions. To confirm the utility of the robot, we conducted an experiment in which users watched TV with the robot. Results indicate that over 70% of the participants responded that the robot “promoted active conversations among people” and “created a more relaxed atmosphere.” Thus, our study demonstrates that a TV-watching robot can create novel and rich media experiences and stimulate communication among people.
This study presents a method for generating utterances for companion robots that watch TV with people, using TV program subtitles. To enable the robot to automatically generate relevant utterances while watching TV, we created a dataset of approximately 12,000 utterances that were manually added to the collected TV subtitles. Using this dataset, we fine-tuned a large-scale language model to construct an utterance generation model. The proposed model generates utterances based on multiple keywords extracted from the subtitles as topics, while also taking into account the context of the subtitles by inputting them. The evaluation of the generated utterances revealed that approximately 88% of the sentences were natural Japanese, and approximately 75% were relevant and natural in the context of the TV program. Moreover, approximately 99% of the sentences contained the extracted keywords, indicating that our proposed method can generate diverse and contextually appropriate utterances containing the targeted topics. These findings provide evidence of the effectiveness of our approach in generating natural utterances for companion robots that watch TV with people.