Chinese Character Embedding Based Semantic Query Algorithm for Semi-structured Corpora

2017 
Semantic query is a common natural language process task for many application scenarios. Given an input phrase, the phrases in a corpus with the exact and similar meanings are expected to be responded. As the exact spelling match cannot satisfy the semantic requirements especially when the query phrase has no common words with the targeted ones, the approaches based on word embedding vector learned by neural networks are widely exploited since these vectors represent abundant semantic information. However, for a semi-structured corpus where there is no explicit context, all the above methods cannot be straightly applied effectively. In this paper, we propose CSQ, a semantic query algorithm based on Chinese character embedding. Our algorithm computes the vectors of larger language units with those of smaller language units which are computed by classical embedding models. The composition method is made for the adaptation in accordance with the lack of context, which is the essence of current embedding algorithms. Experiments show the effectiveness for the semi-structured corpora based semantic query task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []