The past 20 years in cytopathology have demonstrated tremendous growth and evolution in the cytopathology of organs located below the diaphragm, with much of the progress centering on the pancreas.Cancer Cytopathology has been at the forefront of reporting advances in the field of pancreaticobiliary cytology, especially for the diagnosis of pancreatic cysts.
Abstract Somatic variant calling involves the identification of genomic alterations that occur in somatic cells, requiring deep coverage to enable high sensitivity for low-frequency variants. Characterizing somatic variants across the entire genome therefore benefits from novel cost-efficient sequencing platforms, such as UG100. Here, we present optimization of variant calling tools for short and structural variants on WGS and WES data from UG100. For calling short variants, we optimized DeepVariant (DV) for somatic calling using data from matched tumor-normal sample pairs, improving both variant calling accuracy and pipeline running time (up to 10-fold). We defined the task of somatic variant calling as deciding if the pileup image containing reads from the tumor and normal samples represents a true somatic variant (vs a germline variant or artifact). The challenge of robust variant calling using deep learning models is exacerbated in somatic calling, where sequencing depth and coverage variability are typically high. Our optimized DV overcomes these challenges by several data sampling strategies. First, allele-frequency preserving down-sampling reduces randomness of read sub-sampling in high coverage regions. Second, alternative allele prioritization samples alt-allele supporting reads first allowing to call variants at very high coverage loci without sacrificing sensitivity and computational efficiency. Finally, a Panel-of-Normals based on targeted WES data provides an additional improvement of precision for this assay type. We used these strategies to train two models, one for tumor characterization using WGS (T/N coverage: 40x-150x/40x-100x), and one for deep WES (T/N coverage: >500x/>120x). We called variants on simulated tumors using the WGS model. For VAF>10% the model showed SNV recall >98% and indel recall >95% with false-positive rate of 0.2/Mb. For VAF range of 5-10%, indel recall was 67% and SNV recall was 86%. To demonstrate the utility of our somatic variant calling, we applied the models to call somatic variants from well characterized cancer cell lines: COLO829, HCC1395 and HCC1143. Results showed F1>90% for variants with VAF>10%. The WES model was used to reliably call variants at VAF>5% on simulated tumors with average SNV recall of 99% with precision >99% and indel recall >86% with precision >94%. To analyze structural and copy-number variations, we optimized the assembly engine of GRIDSS to enable fast calling of structural variations and demonstrate that Control-FREEC can be used to call copy number variants. SV calling on COLO829/COLO829BL achieved sensitivity >95%. In conclusion, our research highlights the utility of UG100 within the field of oncology, demonstrating its capacity for comprehensive and precise somatic variant detection, both on WGS and WES data. Citation Format: Doron Shem-Tov, Maya Levy, Gil Hornung, Ilya Soifer, Hila Benjamin, Ariel Jaimovich, Adam Blattler, William Brandler, Robert Sugar, Isaac Kinde, Omer Barad, Doron Lipson. Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4926.
Abstract Comprehensive characterization of tumor evolution is essential for understanding the drivers of metastasis and treatment resistance, which are still largely unknown. Studying evolution and resistance requires large cohorts of patients; for each patient, a comprehensive phylogenetic tree is used to follow clones over time and space. Deep whole-genome sequencing (WGS) enables the creation of more robust phylogenies, since (i) each clone is characterized by far more clone-specific mutations (compared to whole-exome or panel sequencing); and (ii) WGS yields more accurate clonal and sub-clonal allelic copy-number estimates, enabling more precise estimation of the fraction of cancer cells that harbor each mutation. Due to recent development of more affordable WGS by Ultima Genomics, we set out to test our ability to construct phylogenetic trees based on Ultima WGS. Ultima sequencing has demonstrated high sensitivity and specificity for detecting germline polymorphisms, but its performance on somatic mutation calling has not yet been rigorously studied. First, we adjusted our somatic mutation detection pipeline to work with Ultima-generated WGS data and tested its performance using well-studied cell line tumor/normal pairs. We assembled a ground truth set of somatic mutations based on deep sequencing data from multiple platforms for the breast cancer cell line pairs (HCC1954/HCC1954-BL and HCC1143/HCC1143-BL) and the melanoma tumor/normal cell line pair (COLO829/COLO829-BL). We found that mutation detection performance was on-par with other platforms for single nucleotide variations. Next, we identified 32 patients from various cancer types (breast, lung, cholangiocarcinoma, and melanoma) treated with a variety of therapies. We collected between 4 to 14 samples per patient at autopsy (total of 303), and sequenced them with Ultima to an average coverage of 63x (range 48x-91x). We identified mutations using our Ultima-adjusted WGS pipeline and constructed phylogenetic trees using our PhylogicNDT suite of tools. We then compared the results to trees previously constructed from mostly whole-exomes (and a few whole-genomes) sequenced on Illumina. We demonstrate that the trees generated from Ultima WGS data are consistent with the previously generated trees, and proved to be much more detailed. At commercialization, we anticipate that WGS for projects like this will be performed at ~1/6th the cost of current NGS offerings (on par with current WES costs). Therefore, it will enable larger studies and more accurate phylogenetic reconstruction, which will advance the study of tumor evolution and resistance. Citation Format: Julian Hess, Elizabeth Martin, Ilya Soifer, Hila Benjamin, Mendy Miller, Carrie Cibulskis, Brian P. Danysh, Matthew Coole, Stacey Gabriel, Dejan Juric, Doron Lipson, Gad Getz. Phylogenetic reconstruction across 303 metastatic tumor samples using Ultima whole-genome sequencing dramatically increases subclonal resolution [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 213.
Abstract Whole Genome Sequencing (WGS) has emerged as a pivotal tool for unraveling the intricate genomic alterations underlying cancer, enabling the detection of a broad spectrum of somatic changes in cancer genomes, including single nucleotide variants, insertions, deletions, copy number variations, and structural rearrangements. Characterizing somatic variants across the entire genome significantly benefits from novel cost-efficient sequencing platforms, such as UG100. In the current study, we aim to demonstrate the utility of UG100 WGS in characterizing the landscape of somatic events in breast and lung cancer. Over 35 tumor-normal cancer sample pairs were profiled using UG100 WGS (mean tumor coverage: >100x; normal: >50x). Variant calling was done using a somatic deep-variant algorithm, optimized to UG100 data. Results were compared to Illumina sequencing and high concordance was observed, with over 97% of SNVs and over 93% of indels detected by Illumina identified also by UG100. UG100 sequencing was deeper and allowed for identification of a larger number of genomic variations, including known driver events in cancer-related genes. The somatic mutational signatures by UG100 were concordant with the Illumina platform, with a cosine similarity higher than 99%. Copy number variation and structural variation analysis was performed on all sample pairs, revealing a wide range of genomic aberrations. The integration of genetic sequencing into routine clinical oncology practices has proven instrumental in identifying actionable mutations, predicting treatment responses, and monitoring minimal residual disease. Moving from targeted sequencing to WGS provides a more comprehensive and unbiased assessment of the entire genome, uncovering rare or unexpected mutations that might be missed by targeted approaches. WGS allows for the simultaneous evaluation of both coding and non-coding regions of the genome, providing a deeper understanding of the regulatory elements influencing gene expression. This broader scope enhances our understanding of cancer biology and contributes to a more accurate assessment of the tumor's functional landscape. Citation Format: Hila Benjamin, Ilya Soifer, Doron Shem-Tov, Maya Levy, Islam Oguz Tuncay, Baek-Lok Oh, Jong-Yeon Shin, Young Seok Ju, Sangmoon Lee, Omer Barad, Doron Lipson. Characterizing the genomic landscapes of breast and lung tumors using cost-effective whole genome sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2946.
Abstract UG100 is a novel next-generation sequencing platform that combines high throughput with significantly lower sequencing cost. Previous studies have demonstrated broad applicability of UG100 data for whole-genome germline variant calling, single cell transcriptomics and whole-genome methylation analysis, as well as for recalling cancer signatures from cfDNA at very low fraction of circulating tumor DNA. Somatic variant calling is a natural application for this platform as it can benefit from lower sequencing cost to enable deeper sequencing coverage. Here, we describe the implementation and evaluation of a somatic calling pipeline from UG100 whole genome sequence data. Since deep-learning-based variant calling methods currently outperform statistical variant calling methods for germline variant calling on UG100 data, we cast somatic variant calling as a classification problem. Specifically, we trained a classifier to distinguish if a candidate at a particular location is a somatic variant or a sequencing error. We used a version of DeepVariant optimized for UG100 data to train the deep-learning classifier in three scenarios: tumor only, tumor with an unmatched background sample and matched tumor-normal samples. The labeled truth set for training was generated by mixing whole genome sequenced samples from the genome-in-a-bottle project in a wide range of proportions (0-100% mixing ratio) to simulate various allele frequencies, with an average genome coverage of 100x. The tumor/normal model was the best-performing of the three models with a recall of >98% for SNPs and 90% for Indels at allele fraction > 10%. Notably, the model also showed high specificity as well with 16 false positive SNPs and 19 false positive indels at AF over 10% called on the chromosome that was not part of the training (chr20). We then applied the model for calling from the WGS data on three well characterized pairs of matched tumor and normal cell lines: HCC1143, COLO829 and HCC1395. We evaluated the performance on the pre-defined UG-HCR (Ultima Genomics - High Confidence Region), which includes 95% of the human genome. DeepVariant models performed very well on calling SNPs (>92% recall at allele frequencies above 10%) and indels (>90% recall). The calls were also highly specific, with less than 1/Mb variants absent in the ground truth across the UG-HCR. Lastly, we applied the models to 8 unpaired cell lines with known driver mutations and observed that we call 34/34 driver mutations of length <=20 bp that appear in COSMIC (100% recall). We expect the UG100 sequencer to become an important tool for somatic genome analysis and to enable deep whole-genome sequencing to become a routine assay in clinical oncology. Citation Format: Maya Levy, Doron Shem-Tov, Hila Benjamin, Sima Benjamin, Ilya Soifer, Shlomit Gilad, Danit Lebanony, Nika Iremadze, Eti Meiri, Doron Lipson, Omer Barad. Calling somatic variants from UG100 data using deep learning [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 3134.
Abstract The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first to be explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line (HG008-T) and matched normal cells from duodenal tissue (HG008-N-D) and pancreatic tissue (HG008-N-P). The data come from thirteen whole genome measurement technologies: Illumina paired-end, Element standard and long insert, Ultima UG100, PacBio (HiFi and Onso), Oxford Nanopore (standard and ultra-long), Bionano Optical Mapping, Arima and Phase Genomics Hi-C, G-banded karyotyping, directional genomic hybridization, and BioSkryb Genomics single-cell ResolveDNA. Most tumor data is from a large homogenous batch of non-viable cells after 23 passages of the primary tumor cells, along with some data from different passages to enable an initial understanding of genomic instability. These data will be used by the GIAB Consortium to develop matched tumor-normal benchmarks for somatic variant detection. In addition, extensive data from two different normal tissues from the same individual can enable understanding of mosaicism. Long reads also contain methylation tags for epigenetic analyses. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic mutations. This first-of-its-kind broadly consented open-access resource will facilitate further understanding of sequencing methods used for cancer biology.