Abstract Mining frequent itemsets from data streams by the model of sliding window has been extensively studied. This paper presents an algorithm AFPCFI-DS for mining the frequent itemsets from data streams. The algorithm detects the frequent items using a FP-tree in each sliding window. In processing each new window the algorithm first changes the head table and then modifies the FP-tree according to the changed items in the head table. The algorithm also adopts local updating strategy to avoid the time-consuming operations of searching in the whole tree to add or delete transactions. Our experimental results show that the algorithm is more efficient and has lower time and memory complexity than the algorithms Moment and FPCFI-DS.
Mining frequent closed itemsets from data streams is an important topic. In this paper, we propose an algorithm for mining frequent closed itemsets from data streams based on a time fading module. By dynamically constructing a pattern tree, the algorithm calculates densities of the itemsets in the pattern tree using a fading factor. The algorithm deletes real infrequent itemsets from the pattern tree so as to reduce the memory cost. A density threshold function is designed in order to identify the real infrequent itemsets which should be deleted. Using such density threshold function, deleting the infrequent itemsets will not affect the result of frequent itemset detecting. The algorithm modifies the pattern tree and detects the frequent closed itemsets in a fixed time interval so as to reduce the computation time. We also analyse the error caused by deleting the infrequent itemsets. The experimental results indicate that our algorithm can get higher accuracy results, and needs less memory and computation time than other algorithm.
Text data mining and analysis for power grid safety hidden danger files, and in-depth study of power grid hidden danger investigation standards and norms, can help power grid enterprises to carry out hidden danger management efficiently and conveniently. Firstly, the study explains the content of the hidden problems of power grid enterprises and the direction of the study. It also summarizes the general process and methods of text mining technology, and explores the current research status of text mining technology in grid hidden danger investigation; Secondly, we observe the textual characteristics of the grid hidden danger investigation, get the visualization results as well as the main manifestations of the grid hidden danger by text mining the existing 412 grid hidden danger texts. And we finally explore the difficulties of the grid hidden danger investigation text mining as well as the possible development direction in the future.
Abstract Background : The essential proteins in protein networks play an important role in complex cellular functions and in protein evolution. Therefore, the identification of essential proteins in a network can help to explain the structure, function, and dynamics of basic cellular networks. The existing dynamic protein networks regard the protein components as the same at all time points; however, the role of proteins can vary over time. Results: To improve the accuracy of identifying essential proteins, an improved h -index algorithm based on the attenuation coefficient method is proposed in this paper. This method incorporates previously neglected node information to improve the accuracy of the essential protein search. It can ensure the accuracy of the found proteins while identifying more essential proteins. Conclusions: The described experiments show that this method is more effective than other similar methods in identifying essential proteins in dynamic protein networks. This study can better explain the mechanism of life activities and provide theoretical basis for the research and development of targeted drugs.
Mining frequent itemsets from data stream is an important task in stream data mining. This paper presents an algorithm Stream_FCI for mining the frequent closed itemsets from data streams in the model of sliding window. The algorithm detects the frequent closed itemsets in each sliding window using a DFP-tree with a head table. In processing each new transaction, the algorithm changes the head table and modifies the DFP-tree according to the changed items in the head table. The algorithm also adopts a table to store the frequent closed itemsets so as to avoid the time-consuming operations of searching in the whole DFP-tree for adding or deleting transactions. Our experimental results show that our algorithm is more efficient and has lower time and memory complexity than the similar algorithms Moment and FPCFI-DS.
With the influence of various comprehensive factors such as people's diet and living habits and the external environment, lung cancer has become a high-incidence and high-risk disease in the world. In order to prevent and treat lung cancer, a lung cancer question and answer system based on knowledge graph is built to provide intelligent auxiliary diagnosis and treatment, which can quickly and accurately answer the questions raised by patients. In this article, the lung cancer medical case data is used to construct the knowledge graph of lung cancer, and the problem data set is constructed from the template and enhanced to increase the data volume and diversity of the data. Then build a multi-task learning model, and perform question sentence intent recognition and question sentence entity recognition at the same time, which can learn the relationship between tasks, improve the effect of the two types of tasks, and shorten the training and inference time, which can effectively reduce training cost. Finally, by analyzing the questions through the model, the entity and relationship are obtained, and the answer is obtained by querying from the knowledge graph.
We tackle the problem of answering maximum probabilistic top-k tuple set queries. We use a sliding-window model on uncertain data streams and present an efficient algorithm for processing sliding-window queries on uncertain streams. In each sliding window, the algorithm selects the k tuples with the highest probabilities from sets of different numbers of the tuples with the highest scores. Then, the algorithm computes existential probability of the top-k tuples, and chooses the set with the highest probability as the top-k query result. We theoretically prove the correctness of the algorithm. Our experimental results show that our algorithm requires lower time and space complexity than other existing algorithms.