CrowdMD: Crowdsourcing-based approach for deduplication
2015
Matching dependencies (MDs) were recently introduced as quality rules for data cleaning and entity resolution. They are rules that specify what values should be considered duplicates, and have to be matched. Defining such quality rules on a database instance, is a very expensive and a time consuming process, and requires huge efforts to analyse the whole database. In this demo paper, we present CrowdMD, a hybrid machine-crowd system for generating MDs. It first asks the crowd to determine whether a given pair, from training sample pairs, match or not. Then, it uses data mining techniques to generate attributes constituting an MD. Using a Restaurant database, we will show how the crowders can help to generate MDs by labelling the training sample through the CrowdMD user interface and how MDs can be mined from this training set.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
23
References
3
Citations
NaN
KQI