Wineinformatics: Applying Data Mining on Wine Sensory Reviews Processed by the Computational Wine Wheel

2014 
As the world becomes more digital, data Science is the successful study that incorporates varying techniques and theories from distinct fields. Among all fields, the domain knowledge might be the most important since all data science researchers need to start with the domain problem, and end with useful information within the domain. Identifying new application domain is always considered as fundamental research in the area. Wine was considered as a luxury in old days, however, it is popular and enjoyed by a wide variety of people today. Professional wine reviews provide insights on tens of thousands wines available each year. However, currently, there is no systematic way to utilize those large number reviews to benefit wine makers, distributers and consumers. This project proposes a brand new data science area named Wine informatics. In order to automatically retrieve wines' flavors and characteristics from reviews, which are stored in the human language format, we propose a novel "Computational Wine Wheel" to extract key words. Two different public-available datasets are produced based on our new method in this paper. Hierarchical clustering algorithm is applied on the first dataset and retrieved meaningful clustering results. Association rules algorithm is performed on the second dataset to predict whether a wine is scored above 90 point or not based on the wine savory reviews. 5-fold cross validation experiments are executed based on different parameters and results with a range of 73%a#x007E;82% accuracy are generated. This new domain will bring huge benefits to fields as diverse as computer science, statistics, business and agriculture.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    21
    Citations
    NaN
    KQI
    []