"Hello? Who Am I Talking to?" A Shallow CNN Approach for Human vs. Bot Speech Classification

A. Lieto,D Moro,F. Devoti,C. Parera,Vincenzo Lipari,Paolo Bestagini,Stefano Tubaro

"Hello? Who Am I Talking to?" A Shallow CNN Approach for Human vs. Bot Speech Classification

2019

Automatic speech generation algorithms, enhanced by deep learning techniques, enable an increasingly seamless and immediate machine-to-human interaction. As a result, the latest generation of phone-calling bots sounds more convincingly human than previous generations. The application of this technology has a strong social impact in terms of privacy issues (e.g., in customer-care services), fraudulent actions (e.g., social hacking) and erosion of trust (e.g., generation of fake conversation). For these reasons, it is crucial to identify the nature of a speaker, as either a human or a bot. In this paper, we propose a speech classification algorithm based on Convolutional Neural Networks (CNNs), which enables the automatic classification of human vs non-human speakers from the analysis of short audio excerpts. We evaluate the effectiveness of the proposed solution by exploiting a real human speech database populated with audio recordings from various sources, and automatically generated speeches using state-of-the-art text-to-speech generators based on deep learning (e.g., Google WaveNet).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations