Linguistic summarization using a weighted N-gram language model based on the similarity of time-series data

2016 
This paper describes a method to verbalize the trends of time-series data. As an example of time-series data, we use the price of Nikkei stock average and develop a method to generate natural language sentences which describe how the stock price goes in the market. As the basic idea for making linguistic descriptions of the stock price trends, we firstly classify all the time-series data including a newly observed time-series data, i.e., the target to be verbalized, by means of spectral clustering employing Dynamic Time Warping distance as its similarity metric. Secondly, a bi-gram language model for the newly observed data is built based on the weighted bi-gram language models of the other time-series data classified in the same cluster. The weights for the bi-gram model of the target data from other time-series data are decided based on the similarity between the target data and the other data in the same cluster. Lastly, linguistic summarization for the target data is generated by finding the most likely combination of words by means of dynamic programming, employing the weighted bi-gram model. Through the experiments under the conditions of various cluster numbers in spectral clustering, we have confirmed that natural language sentences, which properly describe the trends of the stock price, are generated by our method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    2
    Citations
    NaN
    KQI
    []