GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

2020 
In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    14
    Citations
    NaN
    KQI
    []