Modeling Worker Performance Based on Intra-rater Reliability in Crowdsourcing : A Case Study of Speech Quality Assessment

2019 
Crowdsourcing has become a convenient instrument for addressing subjective user studies to a large amounts of users. Data from crowdsourcing can be corrupted due to users’ neglect, and different mechanisms has been proposed to address the users’ reliability and to ensure valid experiments’ results. Users that are consistent in their answers or present a high intra-rater reliability score, are desired for subjective studies. This work investigates the relationship between the intra-rater reliability and the user performance in the context of a speech quality assessment task. To this end, a crowdsourcing study has been conducted in which users were requested to rate speech stimuli with respect to their overall quality. Ratings were collected on a 5-point scale in accordance with the ITU-T Rec. P.808. The speech stimuli were taken from the database ITU-T Rec. P.501 Annex D, and the results are to be contrasted with ratings collected in a laboratory experiment. Furthermore, a model as a function of intra-rater reliability, root-mean-squared-deviation between the listeners ratings and age, has been built to predict the listener performance. Such a model is intended to provide a measure of how valid the crowdsourcing results are, when there is no laboratory results to compare to.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    3
    Citations
    NaN
    KQI
    []