Rigorous evaluation of vector-associated model performance (REVAMP)

2021 
DevOps engineers take many factors into account when assessing the suitability of AI/ML algorithms for operations. They rely on documentation about data requirements, parameter settings, theoretical limits, and operating characteristics to anticipate how well an algorithm will perform in the field prior to writing a single line of code. However, the USAF currently lacks code quality and documentation standards for AI/ML, which forces downstream consumers to make assumptions about the behavior of algorithms that often lead to unexpected results and wasted efforts. Therefore, we present a preliminary set of criteria by which to judge the maturity of word embedding algorithms in terms of reproducibility, testability, and documentation quality. We hope the criteria will grow into a set of quality requirements that govern how the DoD procures AI/ML capabilities, and eventually motivate the need for an ML development maturity model. Our nascent evaluation criteria surfaced during our own struggles with trying to replicate and compare the performance between a well-known word embedding algorithm (Word2Vec) and a custom graph embedding algorithm (IRI2Vec), procured specifically for the purpose of connecting missions across ATOs. We walk through two case studies in which we applied our evaluation criteria to assess the maturity of both Word2Vec and IRI2Vec
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []