Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data

Wenge Guo,Mingan Yang,Chuanhua Xing,Shyamal D. Peddada

Analysis of high dimensional data using pre-defined set and subset information, with applications to genomic data

2012

Wenge Guo
Mingan Yang
Chuanhua Xing
Shyamal D. Peddada

Background Based on available biological information, genomic data can often be partitioned into pre-defined sets (e.g. pathways) and subsets within sets. Biologists are often interested in determining whether some pre-defined sets of variables (e.g. genes) are differentially expressed under varying experimental conditions. Several procedures are available in the literature for making such determinations, however, they do not take into account information regarding the subsets within each set. Secondly, variables (e.g. genes) belonging to a set or a subset are potentially correlated, yet such information is often ignored and univariate methods are used. This may result in loss of power and/or inflated false positive rate.

Keywords:

Clustering high-dimensional data
Gene expression profiling
False positive rate
Sample mean and sample covariance
Shrinkage estimator
Univariate
Bioinformatics
Computer science
Data mining
Nominal level
Multiple comparisons problem
DNA microarray
Covariance
Genomics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations