C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis.

2021 
Advances in technology have propelled the growth of methods and methodologies that can create the desired multimedia content. “Automatic image synthesis” is one such instance that has earned immense importance among researchers. In contrast, audio-video scene synthesis, especially from document images, remains challenging and less investigated. To bridge this gap, we propose a novel framework, Comic-to-Video Network (C2VNet), which evolves panel-by-panel in a comic strip and eventually creates a full-length video (with audio) of a digitized or born-digital storybook. This step-by-step video synthesis process enables the creation of a high-resolution video. The proposed work’s primary contributions are; (1) a novel end-to-end comic strip to audio-video scene synthesis framework, (2) an improved panel and text balloon segmentation technique, and (3) a dataset of a digitized comic storybook in the English language with complete annotation and binary masks of the text balloon. Qualitative and quantitative experimental results demonstrate the effectiveness of the proposed C2VNet framework for automatic audio-visual scene synthesis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []