The prediction of human DNase I hypersensitive sites based on DNA sequence information

2020 
Abstract DNase I hypersensitive sites (DHSs) are special regions of the chromosome with loose structure that can be recognized, bound, and cleaved by DNase I enzyme. In these specific regions, the chromatin lacks condensed structure, resulting in increased accessibility. DHSs are hallmarks of gene expression regulation, and the characterization of DHSs is important to understand transcriptional regulatory mechanism and also to facilitate localization of cis-regulatory elements such as promoters, enhancers, insulators, silencers, and locus control regions. Although many experimental methods have been proposed to identify DHSs, these methods are time-consuming and expensive, making it urgent to develop computational methods to predict DHSs. In this study, we described a sequence-based predictor to identity DHSs in the human genome. In the predictor, optimal features were selected from a large feature set including various k-mer nucleotide compositions and correlation information of physicochemical properties of dinucleotides by using a two-step feature selection algorithm. Using 5-fold cross-validation, the proposed method achieved a Matthews correlation coefficient and accuracy of 0.66 and 0.87, respectively, which are higher than those of published DHS predictors, indicating the good performance of our method. The benchmark datasets and trained DHS model are available at https://github.com/Jackie-Suv/iDHS-SVM .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    103
    References
    1
    Citations
    NaN
    KQI
    []