Influence of Number of Stimuli for Subjective Speech Quality Assessment in Crowdsourcing

2018 
Nowadays, crowdsourcing provides an exceptional opportunity for conducting subjective user tests on the Internet with a demographically diverse audience. Previous work has pointed out that the offered tasks should be kept short in time, therefore, participants evaluate at once just a portion of the dataset. Aspects like users' workload and fatigue are important as they relate to a main question: how to optimize study design without compromising results quality by tiring the test participants? This work investigates the influence of the number of presented speech stimuli on the reliability of listeners' ratings in the context of subjective speech quality assessment. A crowdsourcing study have been conducted with 209 listeners that were asked to rate speech stimuli with respect to their overall quality. Participants were randomly assigned to one of three user groups, each of which was confronted with tasks consisting of a different number of stimuli: 10, 20, or 40. The results from the three groups are highly correlated to existing laboratory ratings, the group with the largest number of samples offering the highest correlation. However, participant retention decreased while the study completion time increased. Thus, it might be desirable to offer tasks with less speech stimuli sacrificing ratings' accuracy to some extent.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    10
    Citations
    NaN
    KQI
    []