In mining, document clustering pretends to diminish the document size by constructing the clustering model which is extremely essential in various web-based applications. Over the past few decades, various mining approaches are analysed and evaluated to enhance the process of document clustering to attain better results; however, in most cases, the documents are messed up and degrade the performance by reducing the level of accuracy. The data instances need to be organized and a productive summary have to be generated for all clusters. The summary or the description of the document should demonstrate the information to the users’ devoid of any further analysis and helps in easier scanning of associated clusters. It is performed by identifying the relevant and most influencing features to generate the cluster. This work provides a novel approach known as Productive Feature Selection and Document Clustering (PFS-DocC) model. Initially, the productive features are selected from the input dataset DUC2004 which is a benchmark dataset. Next, the document clustering model is attempted for single and multiple clusters where the generated output has to be more extractive, generic, and clustering model. This model provides more appropriate and suitable summaries which is well-suited for web-based applications. The experimentation is carried out in online available benchmark dataset and the evaluation shows that the proposed PFS-DocC model gives superior outcomes with higher ROUGE score.
Due to the emerging technology era, today a number of firms share their service/product descriptions. Such a group of information in the textual form has some structured information, which is beneath the unstructured text. A new attainment which facilitates the form of a structured metadata by recognizing documents which are likely to have some type and this information is then used for both segregation and search process. The idea of this advent describes some attributes of a text that will match with the query object which acts as identifier both for segregation as well as for storage and retrieval. An adaptive technique is proposed to deal with relevant attributes to annotate a document by satisfying the users querying needs. The solution for annotation-attribute suggestion problem is not based on the probabilistic model or prediction but it is based on the basic keywords that a user can use to query a database to retrieve a document. Experiment results show that Querying value and Content Value approach is much useful in predicting a tag for a document and thus prediction is also based on Querying value and Content value which greatly improves the utility of shared data which is a drawback in the existing system. This approach is different, as we consider only the basic keywords to be matched with the content of a document. When compared with other approaches in the existing system, Clarity is a primary goal as we expect that the annotator may improve the annotations on process. The discovered tags assist on quest of retrieval as an alternative to bookmarking.
Machine learning (ML) and deep learning (DL) are used in numerous fields, particularly to develop effective intrusion detection systems (IDS). Existing wireless network IDS, which rely on a single ML algorithm and have limitations. These include a high rate of false positives, difficulties in recognizing distinct attack patterns, and a high acquisition cost for annotated training datasets. However, hostile threats are always evolving, networks need a smart security solution. In comparison to other ML approaches, DL algorithms are more successful in intrusion detection. This paper presents a DL based ensemble model that combines Multi-verse through Chaotic Atom Search Optimization (MCA) for preprocessing, which eliminates unsolicited/recurrent information in the dataset. The process of optimized feature selection uses Principal Component Analysis (PCA), Chaotic Manta-ray Foraging Optimizations (CMFO), and a grounded grouping method to partition the optimized feature dataset into k-diverse clusters. The recommended model then stacks Support Vector Machine (SVM) as the ensemble model's meta-learner classifier, pre-training the hybrid DL prototypes using the optimized feature dataset cluster. The CNN-LSTM and CNN-GRU models, which integrate Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), are the hybrid DL prototype's key components. The suggested model's performance has been enhanced and compared to six ML techniques: NB, SVM, J48, RF, MLP, and kNN models, utilizing measures such as accuracy, precision, recall, and F-measure. The public can access the Aegean Wi-Fi Intrusion Dataset (AWID) which is used for evaluating the recommended model and is outperformed the contemporary models in the literature.
In this digital age large amounts of information, images and videos can be found in the web repositories which accumulate this information.These repositories include personal, historic, cultural and business event images.Image mining is a limited field in research where most techniques look at processing images instead of mining.Very limited tools are found for mining these images, specifically 3D images.Open source image datasets are not structured making it difficult for query based retrievals.Techniques extracting visual features from these datasets result in low precision values as images lack proper descriptions or numerous samples exist for the same image or images are in 3D.This work proposes an extraction scheme for retrieving cultural artefact based on voxel descriptors.Image anomalies are eliminated with a new clustering technique and the 3D images are used for reconstructing cultural artefact objects.Corresponding cultural 3D images are grouped for a 3D reconstruction engine's optimized performance.Spatial clustering techniques based on density like Particle Varied Density Based Spatial Clustering of Applications with Noise (PVDBSCAN) eliminate image outliers.Hence, PVDBSCAN is selected in this work for its capability to handle a variety of outliers.Clustering based on Information theory is also used in this work to identify cultural object's image views which are then reconstructed using 3D motions.The proposed scheme is benchmarked with Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to prove the proposed scheme's efficiency.Evaluation on a dataset of about 31,000 cultural heritage images being retrieved from internet collections with many outliers indicate the robustness and cost effectiveness of the proposed method towards a reliable and just-in-time 3D reconstruction than existing state-of-the-art techniques
Problem statement: Document clustering is the most important areas of data mining since they are very much and currently the subject of significant global research since such areas strengthen the enterprises of web intelligence, web mining, web search engine design and so forth. Generative models based on the multivariate Bernoulli and multinomial distributions have been widely used for text classification. Approach: This study explores the suitability of multivariate Bernoulli model based probabilistic algorithm for text clustering application. In a multivariate Bernoulli model, a document is represented as a binary vector over the space of words with 0 and 1, indicating that whether word occurs or not in the document. The number of occurrences is not considered. So the word frequency information is lost due to this nature of implementation. In this work, we propose a FFT based transformation technique for improving clustering performance of multivariate Bernoulli model based probabilistic algorithm. We are using the transformation technique to transform the actual term frequency count data in to a time domain signal. So, the weight of frequency of each word will be distributed throughout each row of records. Now if we apply multivariate Bernoulli model on values less than zero and greater than zero, the performance will get increased since there is no information loss in this kind of data representation. Results: In this work, Bernoulli model-based clustering and an improved version of the same will be implemented and evaluated using suitable metrics and the results are shown. Conclusion: The transformation technique in multivariate Bernoulli model improves the performance of document clustering significantly.
Abstract Incorporating PCM in building envelopes is a promising energy-saving technique. Because the PCM container maintains a constant temperature when storing and retrieving energy. The PCM is used in building envelopes to diminish solar irradiation, reducing heat penetration and lowering heating and cooling loads. It’s vital to get the right PCM for different climates. Because the ambient air temperature fluctuated a lot throughout the year. PCM temperature effectiveness is highly determined by local ambient weather conditions. It is challenging to design the PCM integrated building for year-round thermal regulation. This research discussing PCM’s thermal energy effectiveness in building walls for decreasing interior temperature variance in composite climatic systems using numerical simulation using EnergyPlus software.