Beyond n-grams, tf-idf, and word indicators for text: Leveraging the Python API for vector embeddings

2021 
This talk will share strategies that Stata users can use to get more informative word, sentence, and document vector embeddings of text in their data. While indicator and bag-of-words strategies can be useful for some types of text analytics, they lack the richness of the semantic relationships between words that provide meaning and structure to language. Vector space embeddings attempt to preserve these relationships and in doing so can provide more robust numerical representations of text data that can be used for subsequent analysis. I will share strategies for using existing tools from the Python ecosystem with Stata to leverage the advances in NLP in your Stata workflow.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []