Outlier is defined as an observation that deviates too much from other observations. The identification of outliers can lead to the discovery of useful and meaningful knowledge. Outlier detection has been extensively studied in the past decades. However, most existing research focuses on the algorithm based on special background, compared with outlier detection approach is still rare. Most sophisticated methods in data mining address this problem to some extent, but not fully, and can be improved by addressing the problem more directly. The identification of outliers can lead to the discovery of unexpected knowledge in areas such as credit card fraud detection, calling card fraud detection, discovering criminal behaviors, discovering computer intrusion, etc. This paper mainly discusses and compares approach of different outlier detection from data mining perspective, which can be grouped into statistical- based approach, distance-based approach, density-based approach, Information theoretic-based approach.
In the earliest time with the growth of digital libraries and video databases, it is becoming very important to understand and mine the knowledge and information from video database automatically. Many video mining approaches has been proposed till now for extracting useful knowledge from video database. To find the intended information in a video clip or in a video database is still a difficult and laborious task due to its semantic gap between the low-level characteristics and high-level video meaning concepts. We have done survey on previous paper that are representing the various data mining applications, functionalities, and video features.
Web usage mining is an important type of web mining which deals with log files for extracting the information about users how to use website. It is the process of finding out what users are looking for on internet. Some users are looking at only textual data, where others might be interested multimedia data. Web log file is a log file automatically created and manipulated by the web server. The lots of research has done in this field but this paper deals with user future request prediction using web log record or user information. The main aim of this paper is to provide an overview of past and current evaluation in user future request prediction using web usage mining.
Rapid growth of the demand for computational power has led to the creation of large-scale data centers. They consume enormous amounts of electrical power resulting in high operational costs and carbon dioxide emissions. Moreover, modern Cloud computing environments have to provide high Quality of Service (QoS) for their customers resulting in the necessity to deal with powerperformance trade-off. We propose an efficient power management policy for virtualized Cloud data centers. Basically, parametric constraints based automatic power saving system (PCBAPS) is extended form of automatic power saving(APS).In PCBAPS we impose some parametric constraints during virtual machine migration that can be adjusted dynamically to balance the server’s workloads in an efficient way so that migration cost can be improved and energy saving be achieved.
Phonetics is the speech sound that occurs in all human languages. Phonetics plays a important role in improving our communication. It denotes all sound- changes effected for ease of pronunciation. This paper reviews the phonetics algorithms-soundex algorithm, metaphone and double metaphone, matching rating approach. The future work would include experimenting with different variations of the approach.
Data mining, a branch of computer science is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Research papers text retrieval refers to text retrieval techniques applied to research articles and literature available of the different research papers. The volume of published different research papers in different areas especially in computers science, and therefore the underlying knowledge base, is expanding at an increasing rate. By discovering predictive relationships between different pieces of extracted data, data-mining algorithms can be used to improve the accuracy of information. This paper presents a technique for using soft clustering data mining algorithm to increase the accuracy of text extraction. In this proposed work, the work is done to increase the accuracy of text extraction by using KEA with the dictionary approach and by using Artificial Bees Colony algorithm.
There is several issues in Wireless Sensor Networks from which routing are also a major issue which is directly related to energy consumption. In order to increase the lifetime of network energy must be consumed efficiently. This paper describes various routing protocols and their features. We first describe types of routing protocols in wireless senor network and then explain the hierarchical routing protocols which include LEACH, PEGASIS, TEEN, APTEEN and CCPAR.
Background/Objectives: The development of various communication media has generated few problems in retrieving information. The objective of the study is to analyze the performance of retrieving heterogeneous data. Methods/Statistical Analysis: A model was run to simulate the process of retrieving heterogeneous data from several servers. The information was distributed with different load and the servers were randomly selected. The performance had been analysed based on response time and CPU utilisation. A few types of load balancing techniques were applied to distribute the loads among the servers. The impacts on the overall system performance were discussed. Findings: Retrieving data requires high speed, where the response time must be very fast. The performance of retrieving heterogeneous data is a challenge, when servers have high load. When the load balancing techniques were not applied, some of the servers handle the entire load and the other servers have not been fully utilised. The results showed the response time decrease drastically when high load of data were applied to the server. When the load balancing was applied, the results were compared and presented. The results showed an improvement in the overall performance. Improvements/Applications: The load balancing techniques were applied based on several approaches. It allows an improvement in distributing the server load, which results in improvement in the performance. Keywords: Analysing Performance, Big Data Environment, Heterogeneous Information
Text clustering refers to divide text collection into small clusters and require similarity as large as possible in same cluster. Textual clustering technique was introduced in the area of text mining. The two important goals in text clustering are achieving high performance or efficiency and obtaining highly accurate data clusters that are closed to their natural classes or textual document cluster quality. In order to obtain useful information quickly and accurately form the mass information, text clustering technique is an important research direction. The k-means clustering algorithm has limitations, which depends on the initial clustering center and needs to fix the number of clusters in advance. For these reasons a text clustering algorithm based on latest semantic analysis and optimization is proposed. Thus, a new clustering algorithm based on PBO and optimization has been proposed, which effectively solved the high dimensional and sparse problem and overcomes the dependency of the number of clusters and initial clustering center of k –means algorithm.
Text Mining is a field that extracts useful information from the text document according to users need which is not yet discovered. Text Classification is one of the text mining tasks to manage the information efficiently, by classifying the documents into classes using classification and clustering algorithms .Each text document is characterize by a set of features used in text classification method, where these features should be relevant to the task. This paper introduces preprocessing techniques, feature selection methods for classify Punjabi Text documents by clustering and classification algorithm.