Reconfigurable Web Wrapper Agents for Web Information Integration.
2003
This paper provides a solution for rapidly building software agents that can serve as Web wrappers for biological information integration. We define an XML-based language called WNDL, which provides a representation of a Web browsing session. A WNDL script describes how to locate the data, extract the data and combine the data. By executing different WNDL scripts, user can automate virtually all types of Web browsing sessions. We also describe IEPAD, a data extractor based on pattern discovery techniques. IEPAD allows our software agents to automatically discover the extraction rules to extract the contents of a structurally formatted Web page without the need to label a Web page to train a wrapper. With a programmingby-example authoring tool, a user can generate a complete Web wrapper agent by browsing the target Web sites. We have built a variety of applications to demonstrate the feasibility of our approach.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
7
References
7
Citations
NaN
KQI