Linking T cell receptor sequence to transcriptional profiles with clonotype neighbor graph analysis (CoNGA)

2020 
Multi-modal single-cell technologies capable of simultaneously assaying gene expression and surface phenotype across large numbers of immune cells have described extensive heterogeneity within these complex populations, in healthy and diseased states. In the case of T cells, these technologies have made it possible to profile clonotype, defined by T cell receptor (TCR) sequence, and phenotype, as reflected in gene expression (GEX) profile, surface protein expression, and peptide:MHC (pMHC) binding, across large and diverse cell populations. These rich, high-dimensional datasets have the potential to reveal new relationships between TCR sequence and T cell phenotype that go beyond identification of features shared by clonally related cells. In order to uncover these connections in an unbiased way, we developed a graph-theoretic approach---clonotype neighbor-graph analysis or "CoNGA"---that identifies correlations between GEX profile and TCR sequence through statistical analysis of a pair of T cell similarity graphs, one in which cells are linked based on gene expression similarity and another in which cells are linked by similarity of TCR sequence. Applying CoNGA across diverse human and mouse T cell datasets uncovered known and novel associations between TCR sequence features and cellular phenotype including the classical invariant T cell subsets; a novel defined population of human blood CD8+ T cells expressing the transcription factors HOBIT and HELIOS, NK-associated receptors, and a biased TCR repertoire, representing a potential previously undescribed lineage of "natural lymphocytes"; a striking association between usage of a specific V-beta gene segment and expression of the EPHB6 gene that is conserved between mouse and human; and TCR sequence determinants of differentiation in developing thymocytes. As the size and scale of single-cell datasets continue to grow, we expect that CoNGA will prove to be a useful tool for deconvolving complex relationships between TCR sequence and cellular state in single-cell applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    7
    Citations
    NaN
    KQI
    []