Allele-specific transcription factor binding as a benchmark for assessing variant impact predictors

2018 
Genetic variation has long been known to alter transcription factor binding sites, resulting in sometimes major phenotypic consequences. While the performance for current binding site predictors is well characterised, little is known on how these models perform at predicting impact of variants. We collected and curated over 132,000 potential allele-specific binding (ASB) ChIP-seq variants across 101 transcription factors (TFs). We then assessed the accuracy of TF binding models from five different methods on these high-confidence measurements, finding that deep learning methods were best performing yet still have room for improvement. Importantly, machine learning methods were consistently better than the venerable position weight matrix (PWM). Finally, predictions for certain TFs were consistently poor, and our investigation supports efforts to use features beyond sequence, such as methylation, DNA shape, and post-translational modifications. We submit that ASB data is an valuable benchmark for variant impact on TF binding.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    54
    References
    16
    Citations
    NaN
    KQI
    []