Imbalanced data classification: Using transfer learning and active sampling

2023 
Recently, deep learning models have made great breakthroughs in the field of computer vision, relying on large-scale class-balanced datasets. However, most of them do not consider the class-imbalanced data. In reality, the class-imbalanced distribution can lead to the degradation of model performance, reducing the generalization of these models. In addition, in the era of big data, many applications need to use real-time visual data. These data come from different mobile devices, which continuously generate a huge number of visual data. However, there are few studies using real-time data from information systems, real-time data is easy to capture but difficult to use. In order to solve the above problems, we propose a new model (Transfer Learning Classifier, TLC) based on transfer learning to deal with class-imbalanced data. The model includes active sampling module, real-time data augmentation module and DenseNet module. Among them, (1) the newly proposed active sampling module can dynamically adjust the number of samples with skewed distribution; (2) the data augmentation module can expand the real-time data to avoid over-fitting and insufficient data; (3) the DenseNet module is a standard DenseNet network pre-trained on the ImageNet dataset and transferred to TLC for relearning, and then we adjust the memory usage of the standard DenseNet to make it more efficient. In addition, we have applied a new end-to-end real-time data storage and analysis system. A large number of experiments have been carried out on four different long mantissa data sets. Experimental results show that the proposed TLC model can effectively deal with the static data as well as the real-time data, and the classification effect of imbalanced data is better than that of existing models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []