A study on keyword detection using weighted similarity and character sequence for low-resolution medical documents

Makoto Kawamura,Hiroharu Kawanaka,Shunsuke Doi,Takahiro Suzuki,Haruhiko Takase,Shinji Tsuruoka

A study on keyword detection using weighted similarity and character sequence for low-resolution medical documents

2015

By the diffusion of Hospital Information Systems, many medical documents have been computerized. In addition, most of paper documents before computerization have been also scanned and archived as document images. These were usually converted to text data by using document analysis techniques and Optical Character Reader (OCR) and archived for medical document retrieval. However, the resolutions of some documents are not sufficient for character recognition because of storage spaces, scanning regulations and so on. Therefore, we cannot search desired keywords in the documents, as a result, these documents are not still used effectively in medical document retrieval systems. In this study, we discuss a keyword detection and extraction methods for these document images. As the first step of this study, this paper proposes a method to detect and extract desired words from these documents by using weighted dissimilarity and character sequence. Evaluation experiments using actual medical documents are conducted to discuss the effectiveness of the proposed method.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations