MotivationMicrobial biomarker identification has become a key application for variable selection methods, yet real-world studies present new challenges in linking high-dimensional longitudinal microbiome data with complex time-to-event outcomes. To our knowledge, these challenges have not been sufficiently addressed in the literature. The nature of survival endpoints complicates the definition of patient groups, which is necessary for direct comparisons of longitudinal trajectories via differential abundance testing methods. Additionally, existing log-ratio lasso regression methods have not been systematically extended to Cox and Fine-Gray models, particularly with respect to incorporating longitudinal microbial features.Highlights•FLORAL correlates microbial features with continuous, binary, or survival outcomes•FLORAL utilizes longitudinal data to improve feature selection in survival models•False discoveries are controlled by FLORAL's two-step selection procedure•FLORAL identifies meaningful microbial markers in allo-HCTsSummaryIdentifying predictive biomarkers of patient outcomes from high-throughput microbiome data is of high interest, while existing computational methods do not satisfactorily account for complex survival endpoints, longitudinal samples, and taxa-specific sequencing biases. We present FLORAL, an open-source tool to perform scalable log-ratio lasso regression and microbial feature selection for continuous, binary, time-to-event, and competing risk outcomes, with compatibility for longitudinal microbiome data as time-dependent covariates. The proposed method adapts the augmented Lagrangian algorithm for a zero-sum constraint optimization problem while enabling a two-stage screening process for enhanced false-positive control. In extensive simulation and real-data analyses, FLORAL achieved consistently better false-positive control compared to other lasso-based approaches and better sensitivity over popular differential abundance testing methods for datasets with smaller sample sizes. In a survival analysis of allogeneic hematopoietic cell transplant recipients, FLORAL demonstrated considerable improvement in microbial feature selection by utilizing longitudinal microbiome data over solely using baseline microbiome data.Graphical abstract
Poor diet quality lacking plant-based foods, elevated body mass index (BMI), insulin resistance, microbiome dysbiosis, as well as inflammation have all been implicated in progression from monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM) to MM. Whether a dietary intervention can delay progression has not been investigated.
Methods
We conducted a pilot, single-arm trial providing high fiber plant based dietary (HFPBD) intervention for 12 weeks and health coaching for 24 weeks in 20 patients with MGUS/SMM with BMI>25 (NCT04920084). The primary endpoint was feasibility (adherence and BMI reduction). Quality of life, metabolic (insulin, adiponectin, and leptin), microbiome (16S, shotgun) and immune profiling (peripheral blood flow cytometry and Olink panel and paired bone marrow single cell RNAseq/ATACseq) was performed. Transgenic Vk*MYC mice fed high fiber or control diet at mouse(m)SMM phase were monitored for the progression to active mMM. Differences in microbiome and immune profiles between the two groups in mice were assessed.
Results
HFPBD intervention was safe, feasible, improved quality of life and addressed modifiable risk factors - metabolic profile (BMI, insulin resistance, adiponectin leptin ratio), microbiome profile (diversity and butyrate producers) and immune (monocyte) subsets. A reduction in long-term progression trajectory was observed in 2 patients. Consistently, a high fiber diet delayed progression to mMM in Vk*MYC mice affected by mSMM increasing progression-free survival from 12 weeks in the control arm to 30 weeks in the intervention arm with 40% of mice never progressing to mMM within the observational period in the control arm. Both human and mouse data showed that HFPBD modulated gut microbiota diversity and composition favoring the expansion of butyrate-producing bacteria. Short-chain fatty acids increased in the feces of mice fed the high fiber diet. Integrated data from human bone marrow and peripheral blood indicated that the intervention reduced inflammatory biomarkers including C reactive protein and skewed the immune response towards classical anti-inflammatory CD14+ monocytes as well as an IFNg signature, associated with antitumoral functions. Consistently, the bone marrow of Vk*MYC mice fed high fiber diet was more infiltrated by IFNg producing T lymphocytes while displaying less exhausted T cells and immunosuppressive myeloid cells.
Conclusions
This is the first interventional clinical trial and in vivo study to show that a HFPBD intervention delays progression from MGUS/SMM to MM. Our data support the beneficial anti-inflammatory role of HFPBD providing a link between diet, microbiota, and immune modulation to delay disease progression in MGUS/SMM.
Acknowledgements
We thank Nadja Pinnavaia, PhD founder of Plantable for providing subsidized meals and participant coaching through Plantable on the study. We acknowledge the core facilities at Sloan Kettering Institute – Integrated Genomics Operation, Single Cell Analytics Innovation Laboratory, Molecular Microbiology Facility, Hematology Oncology Tissue Bank, Immune Discovery and Modeling Service, Donald B. and Catherine C. Marron Cancer Metabolism Center and Single Cell Analytics Innovation Laboratory. This NUTRIVENTION study was funded by the Allen Foundation, Inc (U.A.S), the National Cancer Institute MSK Paul Calabresi Career Development Award for Clinical Oncology K12 CA184746 (U.A.S.), the Paula and Rodger Riney Foundation (U.A.S., A.M.L., S.G., J.U.P, M.v.d.B.) and the Peter and Susan Solomon Foundation (M.v.d.B.). This study is funded in part through the National Institutes of Health/National Cancer Institute Cancer Center support grant at Memorial Sloan Kettering P30 CA008748. U.A.S. was supported by the National Cancer Institute MSK Paul Calabresi Career Development Award for Clinical Oncology K12 CA184746, International Myeloma Society Career Development Award and American Society of Hematology Scholar Award. U.A.S. was also supported by the American Society of Hematology Clinical Research Training Institute and the Transdisciplinary Research in Energetics and Cancer training workshop R25CA203650 (PI: Melinda Irwin) for this concept. The in vivo mouse study was funded by the AIRC under IG 2018 - ID. 21808 and IG 2023 – ID. 28770 projects as well as the Leukemia and Lymphoma Society (L&LS; Grant #6618-21 (M.B.).
Trial Registration
NCT04920084.
Ethics Approval
The study was conducted in accordance with recognized ethical guidelines (Belmont Report and Declaration of Helsinki) and approved by Memorial Sloan Kettering (MSK) institutional review board. Written informed consent was obtained from patients.
Correlating time-dependent patient characteristics and matched microbiome samples can be helpful to identify biomarkers in longitudinal microbiome studies. Existing approaches typically repeat a pre-specified modeling approach for all taxonomic features, followed by a multiple testing adjustment step for false discovery rate (FDR) control. In this work, we develop an alternative strategy of using log-ratio penalized generalized estimating equations, which directly models the longitudinal patient characteristic of interest as the outcome variable and treats microbial features as high-dimensional compositional covariates. A cross validation procedure is developed for variable selection and model selection among different working correlation structures. In extensive simulations, the proposed method achieved superior sensitivity over the state-of-the-art methods with robustly controlled FDR. In the analyses of correlating longitudinal dietary intake and microbial features from matched samples of cancer patients, the proposed method effectively identified gut health indicators and clinically relevant microbial markers, showing robust utilities in real-world applications. The method is implemented under the open-source R package FLORAL, which is available at https://vdblab.github.io/FLORAL/.