SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing

Isabel Metzger,Emir Y. Haskovic,Allison Black,Whitley M. Yi,Rajat S. Chandra,Mark T. Rutledge,William McMahon,Yindalon Aphinyanaphongs

SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing

2020

Isabel Metzger
Emir Y. Haskovic
Allison Black
Whitley M. Yi
Rajat S. Chandra
Mark T. Rutledge
William McMahon
Yindalon Aphinyanaphongs

This paper presents our approach to multi-class text categorization of tweets mentioning prescription medications as being indicative of potential abuse/misuse (A), consumption/non-abuse (C), mention-only (M), or an unrelated reference (U) using natural language processing techniques. Data augmentation increased our training and validation corpora from 13,172 tweets to 28,094 tweets. We also created word-embeddings on domain-specific social media and medical corpora. Our hybrid pipeline of an attention-based CNN with post-processing was the best performing system in task 4 of SMM4H 2020, with an F1 score of 0.51 for class A.

Keywords:

Social media
Deep learning
F1 score
Computer science
Natural language processing
text categorization
Medical prescription
Prescription drug abuse
Artificial intelligence

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations