An AlgebraicApproach toRule-Based n ormationxtraction

2008 
Traditional approaches torule-based information extraction (IE)haveprimarily beenbasedon regular expres- siongrammars. However, thesegrammar-based systems have difficulty scaling tolargedatasetsandlargenumbersof rules. Inispired by traditional database researchs, we propose analgebraic approach torule-based IE thataddresses these scalability issues through queryoptimization. Theoperators of ouralgebra aremotivated byourexperience inbuilding several rule-based extraction programs overdiverse datasets. Wepresent theoperators ofouralgebra andpropose several optimization strategies motivated bythetext-specific characteristics ofour operators. Finally we validate thepotential benefits ofour approach byextensive experiments overreal-world blogdata.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []