An Approach of XML-ifying the Crude Corpus in the Field of Opinion Mining

Debnath Bhattacharyya,Kheyali Mitra,Min-kyu Choi,Debashis Ganguly

An Approach of XML-ifying the Crude Corpus in the Field of Opinion Mining

2009

This paper is meant for an easy approach for XML ifying of crude corpus in the field of Opinion Mining. The XMLification is done based on regular expressions. Corpus is the plural form of ‘corpora’. It is nothing but the collection of linguistic data. In this proposed work, the corpus is reviews posted on web sites; more specifically some product reviews. The reviews or the opinions are in the html files which are collected from sites like Cnet.com, Epinions.com, Amazon.com, ebay.com etc. After getting the crude corpus of html files, it is polished further to get only the required part of review details from that web page and thus removes the rest. This corpus is processed again and yields ultimate output in the form of XML files which contains only the important parts of the review details from raw html page. These XML files are ready to be used for further steps of Opinion Mining like parts of Speech(POS) tagging or any kind of language processes for machine learning process..

Keywords:

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations