Investigating the Learning Progress of CNNs in Script Identification Using Gradient Values

2019 
Demands for an automatic translation based on Camera-based Multilingual Optical Character Recognition (CM-OCR) are increasing. In addition, CM-OCR methods usually employ a script identification step before character recognition. Recent approaches for script identification depend on a Convolutional Neural Networks (CNN) thanks to its promising performance in the image recognition task. However, researchers mentioned the importance to understand the decision criteria in CNNs as a warning to employ them for actual tasks as black-box classifiers. Thus, the purpose of this research is to investigate the hyperparameter dependence of CNNs and to visualize the region focused by CNNs in the task of script identification. In this research, we applied Grad-CAM to the script identification task of image classification and used the SIW-13 dataset. We investigated the learning progress of CNNs by defining the value used in Grad-CAM as the "reaction" and visualized the region focused by CNNs in script identification. As a result, the learning process was stabilized in the case that the number of hyperparameters was sufficient for the given training samples even though the hyperparameters which should be tuned were increased. This result demonstrated that the capacity to stably learn training samples depends on the number of hyperparameters. In the insufficient capacity case, the learning process was destabilized and it caused scripts with relatively low accuracy. Analyzing one of the low accuracy scripts of the model using Grad-CAM, we found that some failures progress greatly changes by the difference in hyperparameters of CNNs. Scatter plots of the reaction and the probability clarified the capacity of CNNs in each script.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []