The Crux of Voice (In)Security: A Brain Study of Speaker Legitimacy Detection

2019 
A new generation of scams has emerged that uses voice impersonation to obtain sensitive information, eavesdrop over voice calls and extort money from unsuspecting human users. Research demonstrates that users are fallible to voiceimpersonation attacks that exploit the current advancement inspeech synthesis. In this paper, we set out to elicit a deeperunderstanding of such human-centered “voice hacking” basedon a neuro-scientific methodology (thereby corroborating andexpanding the traditional behavioral-only approach in significantways). Specifically, we investigate the *neural underpinnings*of voice security through *functional near-infrared spectroscopy*(fNIRS), a cutting-edge neuroimaging technique, that capturesneural signals in both temporal and spatial domains. We designand conduct an fNIRS study to pursue a thorough investigationof users’ mental processing related to *speaker legitimacy detection*– whether a voice sample is rendered by a target speaker, adifferent other human speaker or a synthesizer mimicking thespeaker. We analyze the neural activity associatedwithin this task as well as the brain areas that may control suchactivity.Our key insight is that there may be no statistically significantdifferences in the way the human brain processes the *legitimatespeakers vs. synthesized speakers*, whereas clear differences arevisible when encountering *legitimate vs. different other humanspeakers*. This finding may help to explain users’ susceptibilityto synthesized attacks, as seen from the behavioral self-reportedanalysis. That is, the impersonated synthesized voices may seem*indistinguishable* from the real voices in terms of both behavioraland neural perspectives. In sharp contrast, prior studies showed*subconscious* neural differences in other real vs. fake artifacts (e.g.,paintings and websites), despite users failing to note these differencesbehaviorally. Overall, our work dissects the fundamentalneural patterns underlying voice-based insecurity and revealsusers’ susceptibility to voice synthesis attacks at a biological level.We believe that this could be a significant insight for the securitycommunity suggesting that the human detection of voice synthesisattacks may not improve over time, especially given that voicesynthesis techniques will likely continue to improve, calling forthe design of careful machine-assisted techniques to help humanscounter these attacks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []