WEB DOCUMENT SEGMENTATION USING FREQUENT TERM SETS FOR SUMMARIZATION

2012 
Query sensitive summarization aims at extracting th e query relevant contents from web documents. Web page segmentation focuses on reducing the run time overhead of the summarization systems by grouping the related contents of a web page into segments. A t query time, query relevant segments of the web pa ge are identified and important sentences from these s egments are extracted to compose the summary. DOM tree structures of the web documents are utilized t o perform the segmentation of the contents. Leaf no des of DOM tress are merged to form segments according to the statistical and linguistic similarity measur e. The proposed system has been evaluated by intrinsic approach making use of user satisfaction index. Th e performance of the system is compared with summarization without using preprocessed segments. Performance of this system is more promising than t he other measures like cosine similarity, jaccard measure which make use of sparse term-frequent vectors, since the most frequent term sets are consider ed to measure the relevance. Relevant segments alone n eed to be processed at run time for summarization which reduces the time complexity of the summarization process.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    2
    Citations
    NaN
    KQI
    []