A Novel Approach for Designing Indian Regional Language Based Raw-Text Extractor and Unicode Font-Mapping Tool

2009 
Extracting specific information from a collection of documents is called Information Extraction (IE). In general, the information on the web is well structured in HTML or XML format. And the work of IE from structured documents (in HTML or XML), basically uses learning techniques for pattern matching in the content. In this paper, we have proposed a novel approach for interactive information extraction technique. Here, we have described how this approach enables any naive user to extract Indian regional language based document from a web document efficiently which is quite similar to a standard search engine. It is just similar to a pre-programmed information extraction engine.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []