A Big Data Preprocessing using Statistical Text Mining

Sunghae Jun

A Big Data Preprocessing using Statistical Text Mining

2015

Sunghae Jun

Abstract Big data has been used in diverse areas. For example, in computer science and sociology, there is a differ-ence in their issues to approach big data, but they have same usage to analyze big data and imply the anal-ysis result. So the meaningful analysis and implication of big d ata are needed in most areas. Statistics and machine learning provide various methods for big data analysis. In this paper, we study a process for big data analysis, and propose an efficient methodology of entire p rocess from collecting big data to implying the result of big data analysis. In addition, patent documents have the characteristics of big data, we pro-pose an approach to apply big da ta analysis to patent data, and imply the result of patent big data to build R&D strategy. To illustrate how to use our proposed methodology for real problem, we perform a case study using applied and registered patent documents retrieved f rom the patent databases in the world. Key Words : Big Data Analysis, Statistics, Natural Language Processing, Text Mining, Patent Analysis, Linear Model.Received: Aug. 28, 2015Revised : Sep. 17, 2015Accepted: Sep. 19, 2015

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations