Development of an Automatic Trend Exploration System using the MuST Data Collection

Masaki Murata,Koji Ichii,Qing Ma,Tamotsu Shirado,Toshiyuki Kanamaru,Sachiyo Tsukawaki,Hitoshi Isahara

Development of an Automatic Trend Exploration System using the MuST Data Collection

2006

The automatic extraction of trend information from text documents such as newspaper articles would be useful for exploring and examining trends. To enable this, we used data sets provided by a workshop on multimodal summarization for trend information (the MuST Workshop) to construct an automatic trend exploration system. This system first extracts units, temporals, and item expressions from newspaper articles, then it extracts sets of expressions as trend information, and finally it arranges the sets and displays them in graphs. For example, when documents concerning the politics are given, the system extracts "%" and "Cabinet approval rating" as a unit and an item expression including temporal expressions. It next extracts values related to "%". Finally, it makes a graph where temporal expressions are used for the horizontal axis and the value of percentage is shown on the vertical axis. This graph indicates the trend of Cabinet approval rating and is useful for investigating Cabinet approval rating. Graphs are obviously easy to recognize and useful for understanding information described in documents. In experiments, when we judged the extraction of a correct graph as the top output to be correct, the system accuracy was 0.2500 in evaluation A and 0.3334 in evaluation B. (In evaluation A, a graph where 75% or more of the points were correct was judged to be correct; in evaluation B, a graph where 50% or more of the points were correct was judged to be correct.) When we judged the extraction of a correct graph in the top five outputs to be correct, accuracy rose to 0.4167 in evaluation A and 0.6250 in evaluation B. Our system is convenient and effective because it can output a graph that includes trend information at these levels of accuracy when given only a set of documents as input.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations