A Study on Environmental Sound Modeling Based on Deep Learning

Tomoki Hayashi

A Study on Environmental Sound Modeling Based on Deep Learning

2021

Tomoki Hayashi

Recent improvements in machine learning techniques have opened new opportunities to analyze every possible sound in the real-world situation, namely, understanding environmental sound. This is a challenging problem because the goal is to understand every possible sound in a given environment, from the sound of glass breaking to the crying of children. This chapter focuses on Sound Event Detection (SED), one of the most important tasks in the field of understanding environmental sound, and addresses three problems that affect the performance of monophonic, polyphonic, and anomalous SED. The first problem is how to combine multi-modal signals to extend the range of detectable sound events into human activities. The second one is how to model the duration of sound events that is one of the essential characteristics to improve polyphonic SED performance. The third one is how to model normal environments in the time domain to improve anomalous SED systems. This chapter introduces how the proposed method solves each problem and reveals the effectiveness of the proposed method to improve the performance of each SED task. Furthermore, discussions about the relationship between each work and the Real-World Data Circulation (RWDC) reveal how each work accomplishes what kind of data circulation.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations