Predictive Modeling of Pseudomonas syringae Virulence on Bean using Gradient Boosted Decision Trees

2021 
Pseudomonas syringae is a genetically diverse bacterial species complex responsible for numerous agronomically important crop diseases. Individual isolates of P. syringae are typically assigned pathovar names based on their host of isolation and the associated disease symptoms, and these pathovar designations are often assumed to reflect host specificity. Unfortunately, this assumption has rarely been rigorously tested, which poses a challenge when trying to identify genetic factors associated with host specificity. Here we develop a rapid seed infection assays to measure the virulence of 121 diverse P. syringae isolates on common bean (Phaseolus vulgaris). This collection includes P. syringae phylogroup 2 bean isolates (pathovar syringae) that cause bacterial spot disease and P. syringae phylogroup 3 bean isolates (pathovar phaseolicola) that cause the much more serious halo blight disease. We find that phylogroup 2 strains generally show lower levels of host specificity on bean, with the average level of virulence for all strains in this phylogroup (irrespective of host of isolation) being higher than the average level for all other P. syringae strains. We then use gradient boosted decision trees to model the P. syringae virulence weights using whole genome kmers, type III secreted effector kmers, and the presence / absence of type III effectors and phytotoxins. Our machine learning model performed best using whole genome data, and we were able to predict bean virulence with high accuracy (mean absolute error as low as 0.05). Finally, we functionally validated the model by predicting virulence for 16 strains and found that 15 (94%) of the strains had virulence levels within the bounds of estimated predictions given the calculated RMSE values ({+/-}0.20). This study further illustrates that P. syringae phylogroup 2 strains may have evolved a different lifestyle than other P. syringae strains and demonstrates the power of machine learning for predicting host specific adaptation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    49
    References
    0
    Citations
    NaN
    KQI
    []