Cell type specific (CTS) analysis is essential to reveal biological insights obscured in bulk tissue data. However, single-cell (sc) or single-nuclei (sn) resolution data are still cost-prohibitive for large-scale samples. Thus, computational methods to perform deconvolution from bulk tissue data are highly valuable. We here present EPIC-unmix, a novel two-step empirical Bayesian method integrating reference sc/sn RNA-seq data and bulk RNA-seq data from target samples to enhance the accuracy of CTS inference. We demonstrate through comprehensive simulations across three tissues that EPIC-unmix achieved 4.6% - 109.8% higher accuracy compared to alternative methods. By applying EPIC-unmix to human bulk brain RNA-seq data from the ROSMAP and MSBB cohorts, we identified multiple genes differentially expressed between Alzheimer's disease (AD) cases versus controls in a CTS manner, including 57.4% novel genes not identified using similar sample size sc/snRNA-seq data, indicating the power of our
SMR results using sQTLs from GTEx (release 8) # Data usage policy When using this data, you must acknowledge the source by citing the publication "Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits" (https://doi.org/10.1101/814350). # Disclaimer The data is provided "as is", and the authors assume no responsibility for errors or omissions. The User assumes the entire risk associated with its use of these data. The authors shall not be held liable for any use or misuse of the data described and/or contained herein. The User bears all responsibility in determining whether these data are fit for the User's intended use. The information contained in these data is not better than the original sources from which they were derived, and both scale and accuracy may vary across the data set. These data may not have the accuracy, resolution, completeness, timeliness, or other characteristics appropriate for applications that potential users of the data may contemplate.
The user is responsible to comply with any data usage policy from the original GWAS studies; refer to the list of traits described [here](https://www.biorxiv.org/content/10.1101/814350v1) to identify their respective Consortia's requirements. THE DATA IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE DATA.
Abstract Leptomeningeal metastases (LM), a diffuse form of brain metastases is rare and fatal progression of non-small cell lung cancer (NSCLC). In LM, metastatic cancer cells spread and resign on the brain meninges, the cerebrospinal fluid (CSF), cranial and spinal nerves. Rapid disease progression and scarce tissue availability hinder the progress of scientific study of LM and its treatment. To overcome the critical lack of tissue and to determine the genetic profile of NSCLC LM, we have developed methods to extract tumor-associated cell-free RNA from CSF, and isolated and sequenced circulating single cells from CSF. Herein, we used high throughput qPCR to target lung and brain-associated genes and identified NSCLC LM metastases-related RNA. Brain-specific gene signature (GFAP, NRGN, SNCB, ZBTB18) was detected in all CSF sample (control and metastases), whereas lung-specific genes (MUC1, SFTPB, SFTPD, SLC34A2) were detected in CSF of brain metastases patients. Normal, healthy CSF lacks cellular component, but CSF of patients with LM metastases inhabited with very low amount of circulating tumor cells. Single cells from CSF of 4 patients with NSCLC LM metastases were captured with microfluidic chip. Cells (n = 197) were clustered by significantly differential expressed genes demonstrating two distinct populations of white blood and tumor cells. These data identified specific cfRNA and single cell transcriptome profiles compared to normal cells or patients without NSCLC LM metastases, and highlighted metastases-associated carcinoembryonic antigen-related cell adhesion molecule 6 (CEACAM6) as highly expressed in patients with NSCLC LM metastases. CEACAM6 mRNA was detected in CSF of 86% of patients with NSCLC LM but not in the CSF of control patients. In vitro inhibition of CEACAM6 protein lead to decreased invasion in NSCLC cells which was rescued by overexpression of the protein. We have developed sensitive and robust techniques to leverage human CSF to study NSCLC LM.
Vinyl-substituted alcohols represent a highly useful class of molecular skeletons. The current method typically requires either stoichiometric metallic reagents or preformed precursors. Herein, we report a nickel catalysis-enabled synthesis of vinyl-substituted alcohols via a 5-membered oxa-metallacycle. In this protocol, acetylene, the simplest alkyne and abundant feedstock, is employed as an ideal C2 synthon. The reaction features mild conditions, good functional group tolerance and broad substrate scope. Mechanistic exploration implies that the oxa-metallacycle originated from the cyclometallation of aldehyde and acetylene is the key intermediate for this transformation, which is then terminated by a silane-mediated σ-bond metathesis and subsequent reductive elimination.
Significance Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it, to our knowledge, the first method to simultaneously fold and align whole genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, the longest known RNA virus (∼30 kb). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Brain metastases have unique genetic mechanisms which enable them to engraft and grow either within the brain (solid) or on the surface of the brain and nerves (leptomeningeal disease; LMD) A solid brain metastasis occurs in up to 45% of all cancer patients compared to an approximate 5% incidence of LMD. Unlike solid brain metastases, LMD cannot be resected and has a worse survival, often times weeks to months after diagnosis. Why metastatic cells favor one brain site over the other has yet to be determined. To date, there has been no systematic study of the genetic differences between LMD and solid brain metastases. Through whole exome sequencing we compared the mutational landscape of lung-to-brain metastases in eight cytology-positive LMD samples and 38 solid brain tumor metastases. Cerebral spinal fluid and normal control samples (blood or saliva) of patients with LMD were collected at Stanford Hospital. Samples underwent DNA extraction, indexed library preparation, exome enrichment, and sequencing with pair end-reads on an Illumina NextSeq system. LMD mutations were compared to exome sequencing data of solid brain metastases in the Cancer Genome Atlas. We identified a subset of recurrently mutated genes (e.g. EGFR, TAS2R31, CACNA1I) that occured more frequently in our LMD samples. Another subset of genes (e.g. KRAS, LRP1B, CSMD3) were more frequently mutated in the solid brain metastasis samples. TP53, MUC16, MUC17 were among the genes shared across LMD and solid lung-to-brain metastases. No recurrent mutations were shared between all LMD samples, highlighting LMD heterogeneity. Solid brain metastases and LMD appear to harbor distinct genetic mutations, suggesting unique mechanisms for growth in these unique brain niches. A better understanding of the genomic events facilitating lung-to-brain metastases will improve targeted treatment options, and eventually patient outcomes.
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold’s purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5’ and 3’ UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. Significance Statement Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.