logo
    Mining Punjabi Text Using Clustering and Classification Techniques
    0
    Citation
    16
    Reference
    20
    Related Paper
    Abstract:
    Text Mining is a field that extracts useful information from the text document according to users need which is not yet discovered. Text Classification is one of the text mining tasks to manage the information efficiently, by classifying the documents into classes using classification and clustering algorithms .Each text document is characterize by a set of features used in text classification method, where these features should be relevant to the task. This paper introduces preprocessing techniques, feature selection methods for classify Punjabi Text documents by clustering and classification algorithm.
    Keywords:
    Document Clustering
    Document classification
    Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification etc. This paper discusses a detailed survey on the text classification process and various algorithms used in this field.
    Identification
    Statistical classification
    Sentiment Analysis
    Citations (99)
    Classification of text documents become a need in today’s world due to increase in the availability of electronic data over internet. Till now, no text classifier is available for the classification of Punjabi documents. The objective of the work is to find best Punjabi Text Classifier for Punjabi language. Two new algorithms, Ontology Based Classification and Hybrid Approach (which is the combination of Naive Bayes and Ontology Based Classification) are proposed for Punjabi Text Classification. A corpus of 180 Punjabi News Articles is used for training and testing purpose of the classifier. The experimental results conclude that Ontology Based Classification (85%) and Hybrid Approach (85%) provide better results in comparison to standard classification algorithms, Centroid Based Classification (71%) and Naive Bayes Classification (64%).
    Document classification
    Statistical classification
    Citations (22)
    By increasing the access to electronic documents and rapid growth of World Wide Web, documents classification task automatically has become a key method to organizing information and knowledge discovery. The appropriate classification of electronic documents, online news, weblogs, emails and digital libraries required for text mining, machine learning techniques and natural language processing is to obtain meaningful knowledge. The aim of this paper is to highlight the major techniques and methods applied in classification of documents. In this paper, we review some existing methods of text classification.
    Statistical classification
    One-class classification
    Document classification
    Citations (5)
    Data mining is process of identify the knowledge from large data set.Knowledge discovery from textual database is a process of extracting interested or non retrival pattern from unstructured text document.With rapid growing of information increasing trends in people to extract knowledge from large text document.A text mining frame work contain preprocess on text and techniques used to retrieve information like classification, clustering, summarization, information extraction, and visualization. .There are several text classification techniques are review in this review paper such as SVM, Naïve bayes, KNN, Association rule, and decision tree classifier.Which categorized the text data in to pre define class.In this review paper we study deferent techniques of text mining to extracting relevant information on demand.The goal of the paper is to review and understand different text classification techniques and finding the best one out for different prospective.From reviews I propose method with the use best classification method to improve the performance of result and improve indexing.And show the comparison of different classification techniques.
    Biomedical text mining
    Citations (19)
    Automatic document classification is an important step in organizing and mining documents. Information in documents is often conveyed using both text and images that complement each other. Typically, only the text content forms the basis for features that are used in document classification. In this paper, we explore the use of information from figure images to assist in this task. We explore image clustering as a basis for constructing visual words for representing documents. Once such visual words are formed, the standard bag-of-words representation along with commonly used classifiers, such as the naïve Bayes, can be used to classify a document. We report here results from classifying biomedical documents that were previously used in the TREC Genomics track, employing the image-based representation. Efforts are ongoing to improve image-based classification and analyze the relationships between text and images. The goal is to develop a new set of features to supplement current text-based features.
    Citations (8)
    Text data is the most common form of storing information. When engine search an query, user obtained the large collection of text data. All this retrieve text data are not relevant to the required information. So, it needs to organise the massive amount of text data. Analysing and processing the text data is mainly considered in text mining. Text mining uses the standard data mining methods- classification and clustering. These two methods are used to arrange the documents which are usually represented by hundreds or thousands of texts (words) data. Text data in the document can be represented in various representation methods. In this paper, we have presented a study of various research paper that explore the area of text mining including different document representation methods and their impact on clustering and classification results.
    Document Clustering
    Representation
    Concept mining
    Citations (18)
    Although there is much research of text classification based on vector spaces using word information in the whole text, generally humans can recognize the field by finding the specific words. This paper describes what is field-associated term and how to discover field-associated terms, which exist in any text. In this paper, such words are called a field association (FA) word that can be directly related to the field classification. Five criteria of FA terms are defined for hierarchical fields. All of them are stored to field tree to make use of extraction of field-coherent passages for document classification. The presented approach is estimated by the simulation results of 140 fields text files of sports field and extended by 197 text field of civil engineering.
    Document classification
    Tree (set theory)
    Citations (1)
    Study and application of text data mining is one of the most important problems in the data mining. In this paper, we firstly study a method of text data mining. We first discuss the signification and importance of text data mining, and present the definition of text mining and some types of text classification. Then we give the key theory on text classification in detail, such as data processing, character mining, character denoting and character matching. Finally, we get some results of experiment by using a simple system based on the text classification method. These results of experiment mean that the method is feasible.
    Concept mining
    Citations (0)
    Data mining is a one of most popular technologies to information management. One of the most important our data, is text document. We can sort and classify these text documents by data mining techniques. Text classification is a technique to sorting text documents. Basic steps in text classification are preprocessing of documents, feature extract/ selection, selects learning algorithm and evaluation. When we want to classify text documents using computer systems and machine learning techniques, one of the most important steps in text classification is selecting a learning algorithm. In this paper we review some effectiveness learning algorithm researches and show review results of these in a table form.
    Data pre-processing
    Document classification
    Feature (linguistics)
    Citations (0)