Abstract For patients with hormone receptor-positive, early breast cancer without HER2 amplification, multigene expression assays including Oncotype DX ® recurrence score (RS) have been clinically validated to identify patients who stand to derive added benefit from adjuvant cytotoxic chemotherapy. However, cost and turnaround time have limited its global adoption despite recommendation by practice guidelines. We investigated if routinely available hematoxylin and eosin (H&E)-stained pathology slides could act as a surrogate triaging data substrate by predicting RS using machine learning methods. We trained and validated a multimodal transformer model, Orpheus, using 6,203 patients across three independent cohorts, taking both H&E images and their corresponding synoptic text reports as input. We showed accurate inference of recurrence score from whole-slide images (r = 0.63 (95% C.I. 0.58 - 0.68); n = 1,029), the raw text of their corresponding reports (r = 0.58 (95% C.I. 0.51 - 0.64); n = 972), and their combination (r = 0.68 (95% C.I. 0.64 - 0.73); n = 964) as measured by Pearson’s correlation. To predict high-risk disease (RS>25), our model achieved an area under the receiver operating characteristic curve (AUROC) of 0.89 (95% C.I. 0.83 - 0.94), and area under the precision recall curve (AUPRC) of 0.64 (95% C.I. 0.60 - 0.82), compared to 0.49 (95% C.I. 0.36 - 0.64) for an existing nomogram based on clinical and pathologic features. Moreover, our model generalizes well to external international cohorts, effectively identifying recurrence risk (r = 0.61, p < 10 -4 , n = 452; r = 0.60, p < 10 -4 , n = 575) and high-risk status (AUROC = 0.80, p < 10 -4 , AUPRC = 0.68, p < 10 -4 , n = 452; AUROC = 0.83, p < 10 -4 , AUPRC = 0.73, p < 10 -4 , n = 575) from whole-slide images. Probing the biologic underpinnings of the model decisions uncovered tumor cell size heterogeneity, immune cell infiltration, a proliferative transcription program, and stromal fraction as correlates of higher-risk predictions. We conclude that at an operating point of 94.4% precision and 33.3% recall, this model could help increase global adoption and shorten lag between resection and adjuvant therapy.
Pulsed electron electron double resonance (PELDOR) is a well-established method for measuring nanometer distances between paramagnetic centres. Here, we demonstrate on three rigid and conjugated biradicals how the presence of an exchange coupling constant J and its distribtion ΔJ influences PELDOR data and its analysis. In principle two combinations of J and D fulfill the experimental data in each case. The correct one, including the sign of J, can be determined via simulations in case the two halves of the Pake pattern are separated enough.
Previously, we developed the deoxycytosine analog Ç (C-spin) as a bi-functional spectroscopic probe for the study of nucleic acid structure and dynamics using electron paramagnetic resonance (EPR) and fluorescence spectroscopy. To understand the effect of Ç on nucleic acid structure, we undertook a detailed crystallographic analysis. A 1.7 Å resolution crystal structure of Ç within a decamer duplex A-form DNA confirmed that Ç forms a non-perturbing base pair with deoxyguanosine, as designed. In the context of double-stranded DNA Ç adopted a planar conformation. In contrast, a crystal structure of the free spin-labeled base ç displayed a ∼20° bend at the oxazine linkage. Density function theory calculations revealed that the bent and planar conformations are close in energy and exhibit the same frequency for bending. These results indicate a small degree of flexibility around the oxazine linkage, which may be a consequence of the antiaromaticity of a 16-π electron ring system. Within DNA, the amplitude of the bending motion is restricted, presumably due to base-stacking interactions. This structural analysis shows that the Ç forms a planar, structurally non-perturbing base pair with G indicating it can be used with high confidence in EPR- or fluorescence-based structural and dynamics studies.
SARS-CoV-2 mutants carrying the ∆H69/∆V70 deletion in the amino-terminal domain of the Spike protein emerged independently in at least six lineages of the virus (namely, B.1.1.7, B.1.1.298, B.1.160, B.1.177, B.1.258, B.1.375). We analyzed SARS-CoV-2 samples collected from various regions of Slovakia between November and December 2020 that were presumed to contain B.1.1.7 variant due to drop-out of the Spike gene target in an RT-qPCR test caused by this deletion. Sequencing of these samples revealed that although in some cases the samples were indeed confirmed as B.1.1.7, a substantial fraction of samples contained another ∆H69/∆V70 carrying mutant belonging to the lineage B.1.258, which has been circulating in Central Europe since August 2020, long before the import of B.1.1.7. Phylogenetic analysis shows that the early sublineage of B.1.258 acquired the N439K substitution in the receptor-binding domain (RBD) of the Spike protein and, later on, also the deletion ∆H69/∆V70 in the Spike N-terminal domain (NTD). This variant was particularly common in several European countries including the Czech Republic and Slovakia but has been quickly replaced by B.1.1.7 early in 2021.
Nucleoside Ç, which contains a rigid nitroxide spin label, is effectively reduced in DNA by sodium sulfide to the corresponding amine, yielding a fluorescent probe (Çff) that can report the identity of its base-pairing partner in duplex DNA.
Multigene tests provide information that may guide the optimal treatment regimen for breast cancer (BCa) patients. However, assignment of an individual tumor to any subtype/prognostic risk group shows only moderate reproducibility depending on the assay, tumor composition, gene list and expression thresholds. This single-sample discordance impedes clinical use and raises important questions about which is the right test and whether multiple tests are better than one. We used multiplexed RNA fluorescent in situ hybridization of four BCa biomarkers, estrogen/progesterone/Her2/Ki67, to guide laser capture microdissection followed by RNAseq. This technique, called mFISHseq, ensures tumor purity, facilitates interrogation of tumor heterogeneity, permitting unbiased whole transcriptome analysis. To ascertain multigene test discordance, we applied mFISHseq on a cohort of 1,082 FFPE breast tumors with detailed clinicopathological data and derived molecular subtypes using research based PAM50, AIMS, and our own 293-gene subtyping classifier. We also assigned patients to prognostic risk groups using research based OncotypeDX, GENE70, risk of recurrence by subtype, and GGI. We observed considerable discordance with 24% and 61% of patients having at least one multigene test in disagreement for molecular subtyping and prognostic risk assignment, respectively. To improve single sample concordance, we implemented a simple voting scheme of the multigene classifiers to assign a consensus molecular subtype/risk group. Consensus subtyping reclassified 30% of patients into subtypes that better fit their transcriptomic risk and outcome, and further identified that 60% of these patients received suboptimal treatment. Likewise, our consensus prognostic risk approach mitigated discordance and provided prognostic insights for patients with high, low, and ultra-low risk. By leveraging spatially resolved, tumor enriched transcriptome profiling, mFISHseq alleviated sample-level discordance and assigned individuals to molecular subtypes/prognostic risk groups that better matched their outcome, thus resolving limitations to clinical adoption.