A Frequent Term-Based Multiple Clustering Approach for Text Documents
1
Citation
7
Reference
10
Related Paper
Citation Trend
Keywords:
Document Clustering
Clustering high-dimensional data
Brown clustering
High-dimensional data clustering mechanisms are appearing, based on information clamorous and low-quality difficulties. Many of the current clustering algorithms become implicitly ineffective if the algorithms' essential similarity measure is calculated between data points in the high-dimensional space. To this end, various projected based clustering algorithms have been proposed. But, most of them faced problems when clusters cover in subspaces with very less dimensionality. To this end, the partition based Improved Clustering Large Applications (ICLARA) mechanism is employed. It is an expansion to approach to trade with data comprising many objects to reduce computing time and RAM storage problems. The proposed describes various representations and provides the most suitable clustering as the result to work with large datasets. The proposed approach is compared with previous hierarchal based CURE (Clustering Using REpresentatives), BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and Partitional Distance-Based Projected Clustering (PDBPC) approaches.Also, we calculated the accuracy of all clustering techniques about their parameters configuration. The experimental results show the proposed Improved CLARA algorithm provides better accuracy compared with previous methods.
Data stream clustering
Clustering high-dimensional data
Single-linkage clustering
Consensus clustering
Constrained clustering
Cite
Citations (0)
Clustering, an supervised learning process is a challenging problem. Clustering result quality improves the overall structure. In this article, we propose an incremental stream of hierarchical clustering and improve the efficiency, reduce time consumption and accuracy of text categorization algorithm by forming an exact sub clustering. In this paper we propose a new method called multilevel clustering which a combination is of supervised and an unsupervised technique for form the clustering. In this method we form four levels of clustering. The proposed work uses the existing clustering algorithm. We develop and discuss algorithms for multilevel clustering method to achieve the best clustering experiment.
Brown clustering
Data stream clustering
Hierarchical clustering
Conceptual clustering
Single-linkage clustering
Consensus clustering
Constrained clustering
Clustering high-dimensional data
Cite
Citations (0)
Clustering, an supervised learning process is a challenging problem. Clustering result quality improves the overall structure. In this article, we propose an incremental stream of hierarchical clustering and improve the efficiency, reduce time consumption and accuracy of text categorization algorithm by forming an exact sub clustering. In this paper we propose a new method called multilevel clustering which a combination is of supervised and an unsupervised technique for form the clustering. In this method we form four levels of clustering. The proposed work uses the existing clustering algorithm. We develop and discuss algorithms for multilevel clustering method to achieve the best clustering experiment.
Data stream clustering
Single-linkage clustering
Conceptual clustering
Hierarchical clustering
Brown clustering
Constrained clustering
Consensus clustering
Clustering high-dimensional data
Cite
Citations (3)
Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we apply it to high dimensional data it causes the dimensional disaster problem due to high computational complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms improve the performance for clustering high dimension dataset from different aspects in different extent. Still these algorithms will improve the performance form a single perspective. The objective of the proposed model is to improve the performance of traditional H-K clustering and overcome the limitations such as high computational complexity and poor accuracy for high dimensional data by combining the three different approaches of clustering algorithm as subspace clustering algorithm and ensemble clustering algorithm with H-K clustering algorithm.
Data stream clustering
Clustering high-dimensional data
Single-linkage clustering
Cite
Citations (0)
Clustering is a process of partitioning data objects into different groups according to some similarity or dissimilarity measure, e.g., distance criterion. The distance criterion fails to group the objects as all the objects are almost equidistant in high dimensional dataset, hence the distance criterion becomes meaningless. In the literature, numerous clustering algorithms are presented for clustering high dimensional dataset, which select relevant dimensions in high dimensional dataset and perform clustering of the objects on the selected dimensions. As these clustering algorithms produce different clustering results on the same dataset, there is confusion in the selection of clustering algorithm for better clustering of high dimensional dataset. In this paper, we present a comparative study of conventional feature selection based clustering algorithms and propose a new feature selection based clustering method IQRAM (inter quartile range and median based clustering of high dimensional dataset) for clustering high dimensional dataset. We perform our experiments on two real datasets and analyse the clustering results using five well-known clustering quality measures and student’s t-test. The qualitative results show that IQRAM outperform ten competitive clustering algorithms.
Clustering high-dimensional data
Single-linkage clustering
Data stream clustering
Consensus clustering
Constrained clustering
Hierarchical clustering
Cite
Citations (3)
Text clustering refers to divide text collection into small clusters and require similarity as large as possible in same cluster. Textual clustering technique was introduced in the area of text mining. The two important goals in text clustering are achieving high performance or efficiency and obtaining highly accurate data clusters that are closed to their natural classes or textual document cluster quality. In order to obtain useful information quickly and accurately form the mass information, text clustering technique is an important research direction. The k-means clustering algorithm has limitations, which depends on the initial clustering center and needs to fix the number of clusters in advance. For these reasons a text clustering algorithm based on latest semantic analysis and optimization is proposed. Thus, a new clustering algorithm based on PBO and optimization has been proposed, which effectively solved the high dimensional and sparse problem and overcomes the dependency of the number of clusters and initial clustering center of k –means algorithm.
Document Clustering
Clustering high-dimensional data
Single-linkage clustering
Data stream clustering
Complete-linkage clustering
Brown clustering
Constrained clustering
k-medians clustering
Cite
Citations (3)
Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace clustering. Based on this survey, we try to identify problems where the different research areas could probably learn from each other.
Consensus clustering
Single-linkage clustering
Data stream clustering
Clustering high-dimensional data
Constrained clustering
k-medians clustering
Brown clustering
Cite
Citations (17)
Clustering high-dimensional data
Data stream clustering
Single-linkage clustering
FLAME clustering
Constrained clustering
Cite
Citations (4)