Testing the significance of patterns with complex null hypotheses

2012 
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Niko Vuokko Name of the doctoral dissertation Testing the Significance of Patterns with Complex Null Hypotheses Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 11/2012 Field of research Computer and Information Science Manuscript submitted 20 September 2011 Manuscript revised 17 November 2011 Date of the defence 18 February 2012 Language English Monograph Article dissertation (summary + original articles) Abstract In data mining large amounts of data are searched through for useful information, pieces of which are called patterns. Significance testing is an important part of this task as the found patterns need to be assessed for their relevance and significance before further actions. Advances in science have brought along the need to evaluate the significance of complicated data patterns within complicated datasets. Significance testing has been historically conducted with specialized methods that cannot be adapted to new applications and many of these methods have problems with their theoretical justification. This thesis suggests using the framework of property-based randomization for building reliable and flexible significance testing tools that can be adapted and extended for a wide variety of applications. The concepts of representation-based randomization and iterative pattern mining are also discussed as ways to enlarge the scope of these tools. The final chapter of the thesis makes a review of the use of these general ideas in various applications such as databases and time series collections. The publications of the thesis are discussed along with selected introductions to other randomization methods that have been proposed.In data mining large amounts of data are searched through for useful information, pieces of which are called patterns. Significance testing is an important part of this task as the found patterns need to be assessed for their relevance and significance before further actions. Advances in science have brought along the need to evaluate the significance of complicated data patterns within complicated datasets. Significance testing has been historically conducted with specialized methods that cannot be adapted to new applications and many of these methods have problems with their theoretical justification. This thesis suggests using the framework of property-based randomization for building reliable and flexible significance testing tools that can be adapted and extended for a wide variety of applications. The concepts of representation-based randomization and iterative pattern mining are also discussed as ways to enlarge the scope of these tools. The final chapter of the thesis makes a review of the use of these general ideas in various applications such as databases and time series collections. The publications of the thesis are discussed along with selected introductions to other randomization methods that have been proposed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    163
    References
    2
    Citations
    NaN
    KQI
    []