Predicting meeting extracts in group discussions using multimodal convolutional neural networks

Fumio Nihei,Yukiko I. Nakano,Yutaka Takase

Predicting meeting extracts in group discussions using multimodal convolutional neural networks

2017

Fumio Nihei
Yukiko I. Nakano
Yutaka Takase

This study proposes the use of multimodal fusion models employing Convolutional Neural Networks (CNNs) to extract meeting minutes from group discussion corpus. First, unimodal models are created using raw behavioral data such as speech, head motion, and face tracking. These models are then integrated into a fusion model that works as a classifier. The main advantage of this work is that the proposed models were trained without any hand-crafted features, and they outperformed a baseline model that was trained using hand-crafted features. It was also found that multimodal fusion is useful in applying the CNN approach to model multimodal multiparty interaction.

Keywords:

Machine learning
Artificial intelligence
Computer vision
Fusion
Facial motion capture
Speech recognition
Convolutional neural network
Computer science
behavioral data
baseline model
Classifier (linguistics)
group discussion
multimodal fusion

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations