Text Data Truth Discovery Using Self-confidence of Sources

2020 
In the era of big data, the same question can get many answers from multiple sources. These answers may conflict with each other. Therefore, how to get the true information (i.e., the truths) from many answers has been a hot research topic. At present, there are many truth discovery methods which employ source reliability to improve the quality of truths. Most existing methods can only handle the categorical data or numerical data, while performs bad on text data. Meanwhile, we observed that the text data contains not only the question answer, but also some implicit information, for example, possible, may, make sure, similar, the same as, etc. These words have nothing to do with the answer but can reflect the self-confidence degree of source. In this paper, we propose a truth discovery framework which takes the implicit information into account. We first analyze text data and extract the answers of questions, then we create two dictionaries composed of self-confidence increasing words and self-confidence decreasing words respectively. Using the dictionaries we extract the self-confidence information from answer descriptions. Finally, we take full advantage of the self-confidence information to improve the performance of truth discovery. We perform experiments using a categorical data and a real-world Chinese text data. Comparing with other methods, our framework performs better, which demonstrates the superiority of our proposed framework.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []