Abstract The omics era has greatly expanded the repertoire of approaches available for researchers and clinicians to unravel the complexity behind cancer onset in humans: Next Generation Sequencing (NGS) approaches can characterize genomes, epigenomes, transcriptomes and proteomes of patient samples. Advanced DNA barcoding and automated microfluidics can take this to the next level, enabling multiomic characterization of single cells. Peripheral blood mononuclear cells (PBMCs) offer a window into the immune system that, when combined with these omics tools, can provide an insight into immune cells mediating anti-tumor responses in cancer patients. Here we detail a workflow using a single blood draw to rapidly produce a diverse set of multiomics results including genomics, epigenomics, transcriptomics and proteomics. This starts with automated sample handling and processing of the primary blood draw to ensure high viability and yield of PBMCs, along with simultaneous plasma separation and collection. These samples are then aliquoted and simultaneously processed for automated and semi-automated whole exome sequencing, single-cell RNA sequencing, methylation sequencing and Olink proteomics assay. Germline and somatic mutations can be detected using whole exome or whole genome sequencing with deep coverage, whereas methylation, ATAC or ChIP sequencing can be used for epigenetic characterization of the same sample. While bulk expression offers a high-level transcriptomics profile, single-cell transcriptomics facilitates detection of gene expression changes in each individual cell type, allowing for analysis of rare cell types including circulating tumor cells. Olink proteomic assays can be utilized for both biomarker discovery and validation, with highly targeted or broad-spectrum panels. With this robust workflow and advanced robotics for sample handling and processing to minimize potential batch effects, all these datatypes can be produced within days of primary sample collection using minimal sample amounts. High throughput integrative omics workflows, as described here, are useful in gaining a multidimensional view of cancer and advance immunotherapies by characterizing immune cell modulation in tumor progression, and can be expanded for use in tumor/normal analysis, evaluation of metastases and exploration of tumor microenvironment. Citation Format: Bhagyashree S. Birla, Andrea O'Hara, Elizabeth Louie, Ben Niu, Haythem Latif, Vanessa Tumilasci, Yang Han, Yongjun Fan, David Corney, Pranay Vishwanath, Wei Wang, Alfredo Staffa, Ilaria DeVito, Laure Turner, Chris Mozdzierz, Peter Nowacki, Ginger Zhou. Harnessing the power of multiomics from a single sample to explore tumor heterogeneity and advancing immuno-oncology research. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 6611.
Abstract Bulk RNA sequencing (RNA-seq) is regularly employed for transcriptional profiling but provides only a snapshot of the average population. Particularly when studying disease, understanding differences in diverse cells is essential thus single-cell resolution is a necessity due to significant cellular heterogeneity. High-throughput droplet- and microfluidic-based approaches for performing single-cell RNA-seq (scRNA-seq) are invaluable techniques for uncovering changes in cells such as differential gene expression and epigenetic states in cancer research. A challenge for scRNA-seq, however, is sample preparation as typically tissues must be processed immediately after collection. We and others have reported alternative approaches to stabilize samples prior to analysis such as cryopreservation of cells and tissues and methanol-based fixation of cells. However, the inclusion of these approaches is still limited in many settings due to the challenging logistics involved in sample collection. To further broaden the ability of researchers to deploy scRNA-seq, we report our experiences from testing a pre-commercial workflow developed by 10x Genomics to resolve these sample management limitations. The use of paraformaldehyde (PFA) to fix samples at the collection site allows samples to be transported to remote laboratories for downstream processing without sacrificing integrity or data quality. This advance has enabled new possibilities for sample accessibility, throughput, and batched analysis in basic and translational research settings. Furthermore, this approach also allows for analysis of transcriptome-wide gene expression at high sensitivity with simultaneous cellular readouts. Here we report a case study comparing the PFA method to standard approaches, and highlight advantages and potential applications previously not possible with reported sample preparation strategies. Citation Format: David Corney, Yang Han, Yu Qiu, Yongjun Fan, Michael Stephens, Andrea O'Hara, Laure Turner, Christopher Mozdzierz, Haythem Latif, Ginger Zhou. Evaluation of a novel single-cell RNA sequencing methodology for use in clinical trial settings [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB034.
The reporting and the analysis of current events around the globe has expanded from professional, editor-lead journalism all the way to citizen journalism. Nowadays, politicians and other key players enjoy direct access to their audiences through social media, bypassing the filters of official cables or traditional media. However, the multiple advantages of free speech and direct communication are dimmed by the misuse of media to spread inaccurate or misleading claims. These phenomena have led to the modern incarnation of the fact-checker -- a professional whose main aim is to examine claims using available evidence and to assess their veracity. As in other text forensics tasks, the amount of information available makes the work of the fact-checker more difficult. With this in mind, starting from the perspective of the professional fact-checker, we survey the available intelligent technologies that can support the human expert in the different steps of her fact-checking endeavor. These include identifying claims worth fact-checking, detecting relevant previously fact-checked claims, retrieving relevant evidence to fact-check a claim, and actually verifying a claim. In each case, we pay attention to the challenges in future work and the potential impact on real-world fact-checking.
The reporting and the analysis of current events around the globe has expanded from professional, editor-lead journalism all the way to citizen journalism. Nowadays, politicians and other key players enjoy direct access to their audiences through social media, bypassing the filters of official cables or traditional media. However, the multiple advantages of free speech and direct communication are dimmed by the misuse of media to spread inaccurate or misleading claims. These phenomena have led to the modern incarnation of the fact-checker --- a professional whose main aim is to examine claims using available evidence and to assess their veracity. Here, we survey the available intelligent technologies that can support the human expert in the different steps of her fact-checking endeavor. These include identifying claims worth fact-checking, detecting relevant previously fact-checked claims, retrieving relevant evidence to fact-check a claim, and actually verifying a claim. In each case, we pay attention to the challenges and the potential impact on real-world fact-checking.
Training and validation data for the PAN @ SemEval 2019 Task 4: Hyperpartisan News Detection. The data is split into multiple files. The articles are contained in the files with names starting with "articles-" (which validate against the XML schema article.xsd). The ground-truth information is contained in the files with names starting with "ground-truth-" (which validate against the XML schema ground-truth.xsd). The first part of the data (filename contains "bypublisher") is labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com. It contains a total of 750,000 articles, half of which (375,000) are hyperpartisan and half of which are not. Half of the articles that are hyperpartisan (187,500) are on the left side of the political spectrum, half are on the right side. This data is split into a training set (80%, 600,000 articles) and a validation set (20%, 150,000 articles), where no publisher that occurs in the training set also occurs in the validation set. Similarly, none of the publishers in those sets will occur in the test set. The second part of the data (filename contains "byarticle") is labeled through crowdsourcing on an article basis. The data contains only articles for which a consensus among the crowdsourcing workers existed. It contains a total of 645 articles. Of these, 238 (37%) are hyperpartisan and 407 (63%) are not, We will use a similar (but balanced!) test set. Again, none of the publishers in this set will occur in the test set. Note that article IDs are only unique within the parts. The collection (including labels) are licensed under a Creative Commons Attribution 4.0 International License. Acknowledgements: Thanks to Jonathan Miller for his assistance in cleaning the data!