The aware toolbox for the detection of law infringements on web pages
2010
In the project Aware we aim to develop an automatic assistant for the detection of law infringements on web
pages. The motivation for this project is that many authors of web pages are at some points infringing copyrightor
other laws, mostly without being aware of that fact, and are more and more often confronted with costly legal
warnings.
As the legal environment is constantly changing, an important requirement of Aware is that the domain
knowledge can be maintained (and initially defined) by numerous legal experts remotely working without further
assistance of the computer scientists. Consequently, the software platform was chosen to be a web-based generic
toolbox that can be configured to suit individual analysis experts, definitions of analysis flow, information
gathering and report generation. The report generated by the system summarizes all critical elements of a given
web page and provides case specific hints to the page author and thus forms a new type of service.
Regarding the analysis subsystems, Aware mainly builds on existing state-of-the-art technologies. Their
usability has been evaluated for each intended task. In order to control the heterogeneous analysis components
and to gather the information, a lightweight scripting shell has been developed. This paper describes the analysis
technologies, ranging from text based information extraction, over optical character recognition and phonetic
fuzzy string matching to a set of image analysis and retrieval tools; as well as the scripting language to define
the analysis flow.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI