Flanker: a tool for comparative genomics of gene flanking regions

2021 
Analysing the flanking sequences surrounding genes of interest is often highly relevant to understanding the role of mobile genetic elements (MGEs) in horizontal gene transfer, particular for antimicrobial resistance genes. Here, we present Flanker, a Python package which performs alignment-free clustering of gene flanking sequences in a consistent format, allowing investigation of MGEs without prior knowledge of their structure. Flanker clusters flanking sequences based on Mash distances, allowing for easy comparison of similarity and the extent of this similarity across sequences. Additionally, Flanker can be flexibly parameterised to finetune outputs by characterising upstream and downstream regions separately and investigating variable lengths of flanking sequence. We apply Flanker to two recent datasets describing plasmid-associated carriage of important carbapenemase genes (blaOXA-48 and blaKPC-2/3) and show that it successfully identifies distinct clusters of flanking sequences (flank patterns), including both known and previously uncharacterised structural variants. We demonstrate that flank patterns are linked to geographical regions and carbapenem phenotypes, suggesting they may be useful as epidemiological markers. Flanker is freely available under an MIT license at https://github.com/wtmatlock/flanker. Data SummaryNCBI accession numbers for all sequencing data used in this study is provided in Supplementary Table 1. The analysis performed in this manuscript can be reproduced in a binder environment provided on the Flanker Github page (https://github.com/wtmatlock/flanker).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    0
    Citations
    NaN
    KQI
    []