An E-Commerce Dataset in French for Multi-modal Product Categorization and Cross-Modal Retrieval.

2021 
A multi-modal dataset of ninety nine thousand product listings are made available from the production catalog of Rakuten France, a major e-commerce platform. Each product in the catalog data contains a textual title, a (possibly empty) textual description and an associated image. The dataset has been released as part of a data challenge hosted by the SIGIR ECom’20 Workshop. Two tasks are proposed, namely a principal large-scale multi-modal classification task and a subsidiary cross-modal retrieval task. This real world dataset contains around 85K products and their corresponding product type categories that are released as training data and around 9.5K and 4.5K products are released as held-out test sets for the multi-modal classification and cross-modal retrieval tasks respectively. The evaluation is run in two phases to measure system performance, first on 10% of the test data, and then on the rest 90% of the test data. The different systems are evaluated using macro-F1 score for the multi-modal classification task and recall@1 for the cross-modal retrieval task. Additionally, a robust baseline system for the multi-modal classification task is proposed. The top performance obtained at the end of the second phase is \(91.44\%\) macro-F1 and \(34.28\%\) recall@1 for the two tasks respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []