Annotating Documents using Active Learning Methods for a Maintenance Analysis Application

2020 
The aircraft cargo industry still maintains vast amounts of the maintenance history of aircraft components in electronic (i.e. scanned) but unsearchable images. For a given supplier, there can be hundreds of thousands of image documents only some of which contain useful information. Using supervised machine learning techniques has been shown to be effective in recognising these documents for further information extraction. A well known deficiency of supervised learning approaches is that annotating sufficient documents to create an effective model requires valuable human effort. This paper first shows how to obtain a representative sample from a supplier's corpus. Given this sample of unlabelled documents an active learning approach is used to select which documents to annotate first using a normalised certainty measure derived from a soft classifier's prediction distribution. Finally the accuracy of various selection approaches using this certainty measure are compared along each iteration of the active learning cycle. The experiments show that a greedy selection method using the uncertainty measure can significantly reduce the number of annotations required for a certain accuracy. The results provide valuable information for users and more generally illustrate an effective deployment of a machine learning application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []