Title extraction from Loosely Structured Data Records

2008 
In this paper, we present a novel title extraction method from loosely structured data records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the data records, we obtain the one in the candidate titles which has the largest length of the dasiasame contentpsila as the accurate title. And for the Web page whose title is occurred before the first data record, the candidate title which has the largest length of the dasiadifferent contentpsila can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []