A Big Data Preprocessing using Statistical Text Mining

2015 
Abstract Big data has been used in diverse areas. For example, in computer science and sociology, there is a differ-ence in their issues to approach big data, but they have same usage to analyze big data and imply the anal-ysis result. So the meaningful analysis and implication of big d ata are needed in most areas. Statistics and machine learning provide various methods for big data analysis. In this paper, we study a process for big data analysis, and propose an efficient methodology of entire p rocess from collecting big data to implying the result of big data analysis. In addition, patent documents have the characteristics of big data, we pro-pose an approach to apply big da ta analysis to patent data, and imply the result of patent big data to build R&D strategy. To illustrate how to use our proposed methodology for real problem, we perform a case study using applied and registered patent documents retrieved f rom the patent databases in the world. Key Words : Big Data Analysis, Statistics, Natural Language Processing, Text Mining, Patent Analysis, Linear Model.Received: Aug. 28, 2015Revised : Sep. 17, 2015Accepted: Sep. 19, 2015
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    4
    Citations
    NaN
    KQI
    []