Semantic Search Pipeline: From Query Expansion to Concept Forging

2021 
When searching a database for a topic (e.g. Covid-19), there may not exist a precise match, especially if the topic is novel. Furthermore, the topic may surface in the data under different guises (‘Covid-19,’ ‘coronavirus,’ ‘pandemic’, etc.). The results of a keyword search are limited by the querier’s imagination and familiarity with the data. Such searches have high precision, but low recall. In order to increase the recall of searches, we present the Semantic Search Pipeline, a novel approach to document retrieval that uses distributional semantic models and locality sensitive hashing to expand queries and efficiently identify other relevant documents that may not contain the obvious query terms. We evaluate the pipeline using a dataset curated from financial customer service call centers, resulting in an increase in recall of 32% over a simple keyword baseline, with a negligible drop in precision. Furthermore, we present the notion of concept forging, a process of tracing a topic or concept through time and through its various surface realizations. Applied to Covid-19, the search pipeline retrieves a set of documents that allow us to uncover the short- and long-term effects of Covid-19 on the lives of the people and businesses impacted by it. Although Covid-19 is a timely test case, our search pipeline is general in nature and can be easily applied to any range of topics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []