Fast Big Textual Data Parsing in Distributed and Parallel Computing Environment

2014 
Currently, tremendous numbers of scientific and technical articles are being published due to the rapid development of the scientific and technical fields. Also, systems are being proposed which can give useful information to users by extracting information from scientific and technical articles. For such systems, we need to be able to extract information from a massive number of documents very fast and reliably. However, legacy parsers, such as Stanford, Enju and so on, cannot consider a large number of documents because such parsers analyze wide context range of the sentence for their parsing, and so those parsers require a lot of time to run. Therefore, in this paper, we report on the development of a parser which is based on MapReduce, a distributed and parallel programming model. Our parser has achieved about nineteen times better performance than that of one of the-state-of-the-art legacy parsers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    1
    Citations
    NaN
    KQI
    []