How to improve efficiency of analysis of sequential data

Witold Andrzejewski,Zbyszko Królikowski,Tadeusz Morzy

How to improve efficiency of analysis of sequential data

2009

Many of todays database applications, including market basket analysis, web log analysis, DNA and protein se- quence analysis utilize databases to store and retrieve sequential data. Commercial database management systems allow to store se- quential data, but they do not support efficient querying of such data. To increase the efficiency of analysis of sequential data new index structures need to be developed. In this paper we propose an indexing scheme for non-timestamped sequences of sets, which sup- ports set subsequence queries. Our contribution is threefold. First, we describe the index logical and physical structure, second, we pro- vide algorithms for set subsequence queries utilizing this structure, and finally we perform experimental evaluation of the index, which proves its feasibility and advantages in set subsequence query pro- cessing. Many of current database applications process complex data types such as: sets, sequences, time series, objects, semistructured data and graphs. Such applica- tion domains include, but are not limited to: bioinformatics, market basket analysis, web server event logging or stock price analysis. In bioinformatics strings of symbols representing either DNA or protein sequences are processed. The analysis is based on finding sequences or subsequences similar to the query sequences. Market basket analysis is based on analysis of either sets of bought items or sequences of sets of items bought by a single customer in some period of time. Queries issued in market basket analysis are in most cases subset queries or set subsequence queries. Web server logs are sequences of timestamped events � The paper is sponsored by The Polish Ministry of Science and Higher Education, grant

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations