Analysis of B-cell receptor repertoires in COVID-19 patients using deep embedded representations of protein sequences
2021
Analyzing B-cell receptor (BCR) repertoires is immensely useful in evaluating one’s immunological status. Conventionally, repertoire analysis methods have focused on comprehensive assessment of clonal compositions, including V(D)J segment usage, nucleotide insertion/deletion, and amino acid distribution. Here, we introduce a novel computational approach that applies deep-learning based protein embedding techniques to analyze BCR repertoires. By selecting the most frequently occurring BCR sequences in a given repertoire and computing the sum of the vector representations of these sequences, we represent an entire repertoire as a 100-dimensional vector and eventually as a single data point in vector space. We demonstrate that our new approach enables us to not only accurately cluster repertoires of COVID-19 patients and healthy subjects, but also efficiently track minute changes in immunity conditions as patients undergo a course of treatment over time. Furthermore, using the distributed representations, we successfully trained an XG-Boost classification model that achieved over 87% mean accuracy rate given a repertoire of CDR3 sequences.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
14
References
0
Citations
NaN
KQI