Privacy-preserving biomedical database queries with optimal privacy-utility trade-offs

2020 
Sharing data across research groups is an essential driver of biomedical research. In particular, biomedical databases with interactive query-answering systems allow users to retrieve information from the database using restricted types of queries (e.g. number of subjects satisfying certain criteria). While these systems aim to facilitate the sharing of aggregate biomedical insights without divulging sensitive individual-level data, they can still leak private information about the individuals in the database through the query answers. Existing strategies to mitigate such risks either provide insufficient levels of privacy or greatly diminish the usefulness of the database. Here, we draw upon recent advances in differential privacy to introduce privacy-preserving query-answering mechanisms for biomedical databases that provably maximize the expected utility of the system while achieving formal privacy guarantees. We demonstrate the accuracy improvement of our methods over existing approaches for a range of use cases, including count, membership, and association queries. Notably, our new theoretical results extend the proof of optimality of the underlying mechanism, previously known only for count queries with symmetric utility functions, to asymmetric utility functions needed for count queries in cohort discovery workflows as well as membership queries -- a core functionality of the Beacon Project recently launched by the Global Alliance for Genomics and Health (GA4GH). Our work presents a path towards biomedical query-answering systems that achieve the best privacy-utility trade-offs permitted by the theory of differential privacy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    1
    Citations
    NaN
    KQI
    []