Semantic Query Labeling Through Synthetic Query Generation

2021 
Searching in a domain-specific corpus of structured documents (e.g., e-commerce, media streaming services, job-seeking platforms) is often managed as a traditional retrieval task or through faceted search. Semantic Query Labeling --- the task of locating the constituent parts of a query and assigning domain-specific predefined semantic labels to each of them --- allows leveraging the structure of documents during retrieval while leaving unaltered the keyword-based query formulation. Due to both the lack of a publicly available dataset and the high cost of producing one, there have been few published works in this regard. In this paper, basing on the assumption that a corpus already contains the information the users search, we propose a method for the automatic generation of semantically labeled queries and show that a semantic tagger --- based on BERT, gazetteers-based features, and Conditional Random Fields --- trained on our synthetic queries achieves results comparable to those obtained by the same model trained on real-world data. We also provide a large dataset of manually annotated queries in the movie domain suitable for studying Semantic Query Labeling. We hope that the public availability of this dataset will stimulate future research in this area.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []