Preprocessing Techniques for Effective Data Extraction and Computation

2013 
World Wide Web information is semi-structured due to the nested structure of HTML code—a lot of information is linked, and much of the Web information is redundant. Web Text Mining helps the whole knowledge mining process to discover and extract the valuable information from unstructured text. The unstructured texts, which contain massive amount of information, cannot simply be used for further processing by computers. Therefore, this paper discusses the importance of standard preprocessing methods and various steps involved in getting the required content effectively. This paper proposes an effective preprocessing and dimensionality reduction technique, which helps in simplifying or speeding up computations; it can improve the text categorization and performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []