Key Points español 中文 (chinese) Question
Can natural language processing be used to gain real-time temporal and geospatial information from social media data about opioid abuse?
Findings
In this cross-sectional, population-based study of 9006 social media posts, supervised machine learning methods performed automatic 4-class classification of opioid-related social media chatter with a maximum F1 score of 0.726. Rates of automatically classified opioid abuse–indicating social media posts from Pennsylvania correlated with county-level overdose death rates and with 4 national survey metrics at the substate level.
Meaning
The findings suggest that automatic processing of social media data, combined with geospatial and temporal information, may provide close to real-time insights into the status and trajectory of the opioid epidemic.
Importance
Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States.
Objective
To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter.
Design, Setting, and Participants
This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 years of publicly available social media posts on Twitter, dated from January 1, 2012, to October 31, 2015, that were geolocated in Pennsylvania. Opioid-mentioning tweets were extracted using prescription and illicit opioid names, including street names and misspellings. Social media posts (tweets) (n = 9006) were manually categorized into 4 classes, and training and evaluation of several machine learning algorithms were performed. Temporal and geospatial patterns were analyzed with the best-performing classifier on unlabeled data.
Main Outcomes and Measures
Pearson and Spearman correlations of county- and substate-level abuse-indicating tweet rates with opioid overdose death rates from the Centers for Disease Control and Prevention WONDER database and with 4 metrics from the National Survey on Drug Use and Health for 3 years were calculated. Classifier performances were measured through microaveraged F1 scores (harmonic mean of precision and recall) or accuracies and 95% CIs.
Results
A total of 9006 social media posts were annotated, of which 1748 (19.4%) were related to abuse, 2001 (22.2%) were related to information, 4830 (53.6%) were unrelated, and 427 (4.7%) were not in the English language. Yearly rates of abuse-indicating social media post showed statistically significant correlation with county-level opioid-related overdose death rates (n = 75) for 3 years (Pearson r = 0.451, P < .001; Spearman r = 0.331, P = .004). Abuse-indicating tweet rates showed consistent correlations with 4 NSDUH metrics (n = 13) associated with nonmedical prescription opioid use (Pearson r = 0.683, P = .01; Spearman r = 0.346, P = .25), illicit drug use (Pearson r = 0.850, P < .001; Spearman r = 0.341, P = .25), illicit drug dependence (Pearson r = 0.937, P < .001; Spearman r = 0.495, P = .09), and illicit drug dependence or abuse (Pearson r = 0.935, P < .001; Spearman r = 0.401, P = .17) over the same 3-year period, although the tests lacked power to demonstrate statistical significance. A classification approach involving an ensemble of classifiers produced the best performance in accuracy or microaveraged F1 score (0.726; 95% CI, 0.708-0.743).
Conclusions and Relevance
The correlations obtained in this study suggest that a social media–based approach reliant on supervised machine learning may be suitable for geolocation-centric monitoring of the US opioid epidemic in near real time.
The problem of drug addiction and overdose has reached epidemic proportions in the United States, and it is largely driven by opioids, both prescription and illicit.1 More than 72 000 overdose-related deaths in the United States were estimated to have occurred in 2017, of which more than 47 000 (approximately 68%) involved opioids,2 meaning that a mean of more than 130 people died each day from opioid overdoses, and approximately 46 of these deaths were associated with prescription opioids.3 According to the Centers for Disease Control and Prevention, the opioid crisis has hit some US states harder than others, with West Virginia, Ohio, and Pennsylvania having death rates greater than 40 per 100 000 people in 2017 and with statistically significant increases in death rates year by year.4 Studies have suggested that the state-by-state variations in opioid overdose–related deaths are multifactorial but may be associated with differences in state-level policies and laws regarding opioid prescribing practices and population-level awareness or education regarding the risks and benefits of opioid use.5 Although the geographic variation is now known, strategies for monitoring the crisis are grossly inadequate.6,7 Current monitoring strategies have a substantial time lag, meaning that the outcomes of recent policy changes, efforts, and implementations8-10 cannot be assessed close to real time. Kolodny and Frieden11 discussed some of the drawbacks of current monitoring strategies and suggested 10 federal-level steps for reversing the opioid epidemic, with improved monitoring or surveillance as a top priority.
In recent years, social media has emerged as a valuable resource for performing public health surveillance,12-15 including for drug abuse.16-18 Adoption of social media is at an all-time high19 and continues to grow. Consequently, social media chatter is rich in health-related information, which, if mined appropriately, may provide unprecedented insights. Studies have suggested that social media posts mentioning opioids and other abuse-prone substances contain detectable signals of abuse or misuse,20-22 with some users openly sharing such information, which they may not share with their physicians or through any other means.13,17,23,24 Manual analyses established the potential of social media for drug abuse research, but automated, data-centric processing pipelines are required to fully realize social media’s research potential. However, the characteristics of social media data present numerous challenges to automatic processing from the perspective of natural language processing and machine learning, including the presence of misspellings, colloquial expressions, data imbalance, and noise. Some studies have automated social media mining for this task by proposing approaches such as rule-based categorization,22 supervised classification,17 and unsupervised methods.5 Studies that have compared opioid-related chatter and its association with the opioid crisis have been unsupervised in nature, and they either do not filter out information unrelated to personal abuses5 or do not quantitatively evaluate the performance of their filtering strategy.21 These and similar studies have, however, established the importance of social media data for toxicovigilance and have paved the platform for end-to-end automatic pipelines for using social media information in near real time.
In this cross-sectional study, we developed and evaluated the building blocks, based on natural language processing and machine learning, for an automated social media–based pipeline for toxicovigilance. The proposed approach relies on supervised machine learning to automatically characterize opioid-related chatter and combines the output of the data processing pipeline with temporal and geospatial information from Twitter to analyze the opioid crisis at a specific time and place. We believe this supervised learning-based model is more robust than unsupervised approaches as it is not dependent on the volume of the overall chatter, which fluctuates from time to time depending on various factors, such as media coverage. This study, which focused on the state of Pennsylvania, suggests that the rate of personal opioid abuse–related chatter on Twitter was reflective of the opioid overdose deaths from the Centers for Disease Control and Prevention WONDER database and 4 metrics from the National Surveys on Drug Use and Health (NSDUH) over a period of 3 years.
Data Collection, Refinement, and Annotation
This cross-sectional study was conducted from December 1, 2017, to August 31, 2019. It was deemed by the University of Pennsylvania Institutional Review Board to be exempt from review as all data used were publicly available. Informed consent was not necessary for this reason. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
Publicly available social media posts on Twitter from January 1, 2012, to October 31, 2015, were collected as part of a broader project through the public streaming API (application programming interface).25 The API provides access to a representative random sample of approximately 1% of the data in near real time.26 Social media posts (tweets) originating from Pennsylvania were identified through the geolocation detection process, as described in Schwartz et al.27 To include opioid-related posts only, our research team, led by a medical toxicologist (J.P.), identified keywords, including street names (relevant unambiguous street names were chosen from the US Drug Enforcement Administration website28) that represented prescription and illicit opioids. Because social media posts have been reported to include many misspellings,29 and drug names are often misspelled, we used an automatic spelling variant generator for the selected keywords.30 We observed an increase in retrieval rate for certain keywords when we combined these misspellings with the original keywords (example in eFigure 1 in the Supplement).
We wanted to exclude noisy terms with low signal to noise ratios for the manual annotation phase. We manually analyzed a random sample of approximately 16 000 social media posts to identify such noisy terms. We found that 4 keywords (dope, tar, skunk, and smack) and their spelling variants occurred in more than 80% of the tweets (eFigure 2 in the Supplement). Manual review performed by one of us (A.S.) and the annotators suggested that almost all social media posts retrieved by these keywords were referring to nonopioid content. For example, the term dope is typically used in social media to indicate something is good (eg, “that song is dope”). We removed all the posts mentioning these keywords, which reduced the data set from more than 350 000 to approximately 131 000, a decrease of more than 50%.
We developed annotation guidelines using the grounded theory approach.31 First, we grouped tweets into topics and then into broad categories. Four annotation categories or classes were chosen: self-reported abuse or misuse (A), information sharing (I), unrelated (U), and non-English (E). Iterative annotation of a smaller set of 550 posts was used to develop the guidelines and to increase agreement between the annotators. For the final annotation set, disagreements were resolved by a third annotator. Further details about the annotation can be found in the pilot publication32 and eTable 1 in the Supplement.
Machine Learning Models and Classification
We used the annotated posts to train and evaluate several supervised learning algorithms and to compare their performances. We experimented with 6 classifiers: naive bayes, decision tree, k-nearest neighbors, random forest, support vector machine, and a deep convolutional neural network. Tweets were preprocessed before training or evaluation by lowercasing. For the first 5 of the 6 classifiers (or traditional classifiers), we stemmed the terms as a preprocessing step using the Porter stemmer.33 As features for the traditional classifiers, we used word n-grams (contiguous sequences of words) along with 2 additional engineered features (word clusters and presence and counts of abuse-indicating terms) that we had found to be useful in our related past work.17 The sixth classifier, a deep convolutional neural network, consisted of 3 layers and used dense vector representations of words, commonly known as word embeddings,34 which were learned from a large social media data set.35 Because the word embeddings we used were learned from social media drug-related chatter, they captured the semantic representations of drug-related keywords.
We randomly split the annotated posts into 3 sets: training, validation, and testing. For parameter optimization of the traditional classifiers, we combined the training and validation sets and identified optimal parameter values by using 10-fold cross-validations (eTable 2 in the Supplement). For the deep convolutional neural network, we used the validation set at training time for finding optimal parameter values, given that running 10-fold cross-validation for parameter optimization of neural networks is time consuming and hence infeasible. The best performance achieved by each classifier over the training set is presented in eTable 3 in the Supplement. To address the data imbalance between classes, we evaluated each individual classifier using random undersampling of the majority class (U) and oversampling of the pertinent smaller classes (A and I) using SMOTE (synthetic minority oversampling technique36).
In addition, we used ensembling strategies for combining the classifications of the classifiers. The first ensembling method was based on majority voting; the most frequent classification label by a subset of the classifiers was chosen as the final classification. In the case of ties, the classification by the best-performing individual classifier was used. For the second ensembling approach, we attempted to improve recall for the 2 nonmajority classes (A and I), which represented content-rich posts. For this system variant, if any post was classified as A or I by at least 2 classifiers, the post was labeled as such. Otherwise, the majority rule was applied.
We used the best-performing classification strategy for all the unlabeled posts in the data set. Our goal was to study the distributions of abuse- and information-related social media chatter over time and geolocations, as past research has suggested that such analyses may reveal interesting trends.5,21,37
We compared the performances of the classifiers using the precision, recall, and microaveraged F1 or accuracy scores. The formulas for computing the metrics were as follows, with tp representing true positives; fn, false negatives; and fp, false-positives:
.
To compute the microaveraged F1 score, the tp, fp, and fn values for all of the classes are summed before calculating precision and recall. Formally,
,
in which F is the function to compute the metric, c is a label, and M is the set of all labels. For a multiclass problem such as this, microaveraged F1 score and accuracy are equal. We computed 95% CIs for the F1 scores using the bootstrap resampling technique38 with 1000 resamples.
For geospatial analyses, we compared the abuse-indicating social media post rates from Pennsylvania with related metrics for the same period from 2 reference data sets: the WONDER database39 and the NSDUH.40 We obtained county-level yearly opioid overdose death rates from WONDER and percentages for 4 relevant substate-level measures (past month use of illicit drugs [no marijuana], past year nonmedical use of pain relievers, past year illicit drug dependence or abuse, and past year illicit drug dependence) from NSDUH. All the data collected were for the years 2012 to 2015. For the NSDUH measures, percentage values of annual means over the 3 years were obtained. We investigated the possible correlations (Pearson and Spearman) between the known metrics and the automatically detected abuse-indicating tweet rates and then visually compared them using geospatial heat maps and scatterplots.
For Pearson and Spearman correlation analyses, we used the Python library SciPy, version 1.3.1. Two-tailed P < .05 was interpreted as statistical significance.
We used 56 expressions of illicit and prescription opioids for data collection, with a total of 213 keywords or phrases, including spelling variants (eTable 4 in the Supplement). The annotations resulted in a final set of 9006 social media posts (6304 [70.0%] for training, 900 [10.0%] for validation, and 1802 [20.0%] for testing). There were 550 overlapping posts between the 2 annotators, and interannotator agreement was moderate with κ = 0.75 (Cohen κ41). Of the 9006 posts, 4830 (53.6%) were unrelated to opioids, 427 (4.7%) were not in the English language, and the proportions of abuse (1748 [19.4%]) and information (2001 [22.2%]) posts were similar (eTable 5 in the Supplement).
To capture the natural variation in the distribution of posts in real time, we did not stratify the sets by class during the training or testing set splitting. Consequently, the testing set consisted of a marginally lower proportion of abuse-indicating posts (17.7%) compared with the training set (19.8%). Statistically significant variation was found in the distribution of posts mentioning prescriptions (2257 [25.1%]) and illicit opioids (7038 [78.1%]) at an approximate ratio of 3:1. Proportions of class A and class I tweets were much higher for prescription opioid tweets (24.7% vs 18.0% for class A; 30.4% vs 20.9% for class I), whereas the proportion of class U tweets (55.1% vs 44.5%) was much higher for the illicit opioid posts (see eTable 5 in the Supplement for post distributions per class).
Table 1 presents the performances of the classification algorithms, showing the recall, precision, and microaveraged F1 score and 95% CIs. Among the traditional classifiers, support vector machines (F1 score = 0.700; 95% CI, 0.681-0.718) and random forests (F1 score = 0.701; 95% CI, 0.683-0.718) showed similar performances, outperforming the others in F1 scores. The deep convolutional neural network outperformed all of the traditional classifiers (F1 score = 0.720; 95% CI, 0.699-0.735). The resampling experiments did not improve performance of the individual classifiers. Both pairs of ensemble classification strategies shown in Table 1 performed better than the individual classifiers, with the simple majority voting ensemble of 4 classifiers (Ensemble_1) producing the best microaveraged F1 score (0.726; 95% CI, 0.708-0.743). Performances of the classifiers were high for class U and class N and low for class A.
The most common errors for the best-performing system (Ensemble_1) were incorrect classification to class U, comprising 145 (79.2%) of the 183 incorrect classifications for posts originally labeled as class A, 122 (67.4%) of the 181 incorrect classifications for posts labeled as class I, and all 4 (100%) of the incorrect classifications for posts labeled as class N (eTable 7 in the Supplement).
Temporal and Geospatial Analyses
Figure 1 shows the monthly frequency and proportion distributions of class A and I posts. The frequencies of both categories of posts increased over time, which was unsurprising given the growth in the number of daily active Twitter users over the 3 years of study as well as greater awareness about the opioid crisis. Greater awareness is perhaps also reflected by the increasing trend in information-related tweets. However, although the volume of abuse-related chatter increased, its overall proportion in all opioid-related chatter decreased over time, from approximately 0.055 to approximately 0.042. The true signals of opioid abuse from social media were likely hidden in large volumes of other types of information as awareness about the opioid crisis increased.
Figure 2 shows the similarities between 2 sets of county-level heat maps for population-adjusted, overdose-related death rates and abuse-indicating post rates as well as a scatterplot illustrating the positive association between the 2 variables. We found a statistically significant correlation (Pearson r = 0.451, P < .001; Spearman r = 0.331, P = .004) between the county-level overdose death rates and the abuse-indicating social media posts over 3 years (n = 75). In comparison, the pioneering study by Graves et al,5 perhaps the study most similar to ours, reported a maximum (among 50 topics) Pearson correlation of 0.331 between a specific opioid-related social media topic and county-level overdose death rates. In addition, we found that the Pearson correlation coefficient increased when the threshold for the minimum number of deaths for including counties was raised. If only counties with at least 50 deaths were included, the Pearson correlation coefficient increased to 0.54; for 100 deaths, the correlation coefficient increased to 0.67.
Figure 3 shows the substate-level heat maps for abuse-indicating social media posts and 4 NSDUH metrics over the same 3-year period, along with scatterplots for the 2 sets of variables. All the computed correlations and their significances are summarized in Table 2 (see eTable 6 in the Supplement for the substate information). Table 2 illustrates the consistently high correlations between abuse-indicating social media post rates and the NSDUH survey metrics over the same 3-year period (n = 13): nonmedical prescription opioid use (Pearson r = 0.683, P = .01; Spearman r = 0.346, P = .25), illicit drug use (Pearson r = 0.850, P < .001; Spearman r = 0.341, P = .25), illicit drug dependence (Pearson r = 0.937, P < .001; Spearman r = 0.495, P = .09), and illicit drug dependence or abuse (Pearson r = 0.935, P < .001; Spearman r = 0.401, P = .17). However, we could not establish statistical significance owing to the small sample sizes.
Opioid misuse or abuse and addiction are among the most consequential and preventable public health threats in the United States.42 Social media big data, coupled with advances in data science, present a unique opportunity to monitor the problem in near real time.20,37,43-45 Because of varying volumes of noise in generic social media data, the first requirement we believe needs to be satisfied for opioid toxicosurveillance is the development of intelligent, data-centric systems that can automatically collect and curate data, a requirement this cross-sectional study addressed. We explored keyword-based data collection approaches and proposed, through empirical evaluations, supervised machine learning methods for automatic categorization of social media chatter on Twitter. The best F1 score achieved was 0.726, which was comparable to human agreement.
Recent studies have investigated potential correlations between social media data and other sources, such as overdose death rates5 and NSDUH survey metrics.21 The primary differences between the current work and past studies are that we used a more comprehensive data collection strategy by incorporating spelling variants, and we applied supervised machine learning as a preprocessing step. Unlike purely keyword-based or unsupervised models,5,46,47 the approach we used appears to be robust at handling varying volumes of social media chatter, which is important when using social media data for monitoring and forecasting, given that the volume of data can be associated with factors such as movies or news articles, as suggested by Figure 1. The heat maps in Figures 2 and 3 show that the rates of abuse-related chatter were much higher in the more populous Pennsylvania counties (eg, Philadelphia and Allegheny), which was likely related to the social media user base being skewed to large cities. More advanced methods for adjusting or normalizing the data in large cities may further improve the correlations.
We also found that the correlation coefficient tended to increase when only counties with higher death rates were included. This finding suggests that Twitter-based classification may be more reliable for counties or geolocations with higher populations and therefore higher numbers of users. If this assertion is true, the increasing adoption of social media in recent years, specifically Twitter, is likely to aid the proposed approach. The correlations between social media post rates and the NSDUH metrics were consistently high, but statistical significance could not be established owing to the smaller sample sizes.
The proposed model we present in this study enables the automatic curation of opioid misuse–related chatter from social media despite fluctuating numbers of posts over time. The outputs of the proposed approach correlate with related measures from other sources and therefore may be used for obtaining near-real-time insights into the opioid crisis or for performing other analyses associated with opioid misuse or abuse.
Classification Error Analysis
As mentioned, the most common error made by the best-performing classifier (Ensemble_1) was to misclassify social media posts to class U, whereas misclassifications to the other 3 classes occurred with much lower frequencies (eTable 7 in the Supplement). We reviewed the confusion matrices from the other classifiers and saw a similar trend. Because class U was the majority class, by a margin, it was the category to which the classifiers tended to group posts that lacked sufficient context. Short lengths of certain posts and the presence of misspellings or rare nonstandard expressions added difficulty for the classifiers to decipher contextual cues, a major cause of classification errors.
Lack of context in posts also hindered the manual annotations, making the categorizations dependent on the subjective assessments of the annotators. Although the final agreement level between the annotators was higher than the levels in initial iterations, it could be improved. Our previous work suggests that preparing thorough annotation guidelines and elaborate annotation strategies for social media–based studies helps in obtaining relatively high annotator agreement levels and, eventually, improved system performances.48,49 We plan to address this issue in future research.
Another factor that affected the performance of the classifiers on class A and class I was data imbalance; the relatively low number of annotated instances for these classes made it difficult for algorithms to optimally learn. The resampling experiments were not associated with improved performances, which is consistent with findings from past research.49,50 Annotating more data is likely to produce improved performances for these classes. Given that several recent studies obtained knowledge from Twitter about opioid use or abuse, combining all the available data in a distant supervision framework may be valuable.51 We will also explore the use of sentence-level contextual embeddings, which have been shown to outperform past text classification approaches.52
In future research, we plan to expand this work to other classes of drugs and prescription medications, such as stimulants and benzodiazepines. Combining machine learning and available metadata, we will estimate the patterns of drug consumption and abuse over time and across geolocations and analyze cohort-level data, building on our previous work.53
This cross-sectional study has several limitations. First, we included social media posts that originated from Pennsylvania. The advantage of machine learning over rule-based approaches is portability, but the possibly differing contents of social media chatter in different geolocations may reduce machine learning performance unless additional training data are added. Social media chatter is also always evolving, with new expressions introduced constantly. Therefore, systems trained with data from specific periods and geolocations may not perform optimally for other periods. The use of dense vector-based representations of texts may address this problem as semantic representations of emerging terms may be learned from large, unlabeled data sets without requiring human annotations.
Second, the moderate interannotator agreement in this study provided a relatively low ceiling for the machine learning classifier performance. More detailed annotation guidelines and strategies may address this problem by making the annotation process less subjective. Furthermore, the correlations we obtained did not necessarily indicate any higher-level associations between abuse-related social media posts and overdose death rates and/or survey responses.
Big data derived from social media such as Twitter present the opportunity to perform localized monitoring of the opioid crisis in near real time. In this cross-sectional study, we presented the building blocks for such social media–based monitoring by proposing data collection and classification strategies that employ natural language processing and machine learning.
Accepted for Publication: August 4, 2019.
Published: November 6, 2019. doi:10.1001/jamanetworkopen.2019.14672
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Sarker A et al. JAMA Network Open.
Corresponding Author: Abeed Sarker, PhD, Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Office 4101, Atlanta, GA 30322 (abeed@dbmi.emory.edu).
Author Contributions: Dr Sarker had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Sarker, Gonzalez-Hernandez, Perrone.
Acquisition, analysis, or interpretation of data: Sarker, Ruan.
Drafting of the manuscript: Sarker, Gonzalez-Hernandez.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Sarker, Ruan.
Administrative, technical, or material support: Gonzalez-Hernandez.
Supervision: Gonzalez-Hernandez, Perrone.
Conflict of Interest Disclosures: Dr Sarker reported receiving grants from the National Institute on Drug Abuse (NIDA), grants from Pennsylvania Department of Health, and nonfinancial support from NVIDIA Corporation during the conduct of the study as well as personal fees from the National Board of Medical Examiners, grants from the Robert Wood Johnson Foundation, and honorarium from the National Institutes of Health (NIH) outside the submitted work. Dr Gonzalez-Hernandez reported receiving grants from NIH/NIDA during the conduct of the study and grants from AbbVie outside the submitted work. No other disclosures were reported.
Funding/Support: This study was funded in part by award R01DA046619 from the NIH/NIDA. The data collection and annotation efforts were partly funded by a grant from the Pennsylvania Department of Health.
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of NIDA or NIH.
Additional Contributions: Karen O’Connor, MS, and Alexis Upshur, BS, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, and Annika DeRoos, College of Arts and Sciences, University of Pennsylvania, performed the annotations. Mss O’Connor and Upshur received compensation for their contributions as staff researchers, and Ms DeRoos received compensation as a sessional research assistant under the mentorship of Dr Sarker. The Titan Xp GPU used for the deep learning experiments was donated by the NVIDIA Corporation.
1.National Academies of Sciences, Engineering, and Medicine; Health and Medicine Division; Board on Health Sciences Policy; Committee on Pain Management and Regulatory Strategies to Address Prescription Opioid Abuse. Pain Management and the Opioid Epidemic: Balancing Societal and Individual Benefits and Risks of Prescription Opioid Use. Washington, DC: National Academies Press; 2017.
16.Phan
N, Chun
SA, Bhole
M, Geller
J. Enabling real-time drug abuse detection in tweets. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE). Piscataway, NJ: IEEE; 2017.
23.Buntain
C, Golbeck
J. This is your Twitter on drugs. Any questions? In: Proceedings of the 24th International Conference on World Wide Web. WWW ’15 Companion. New York, NY: ACM; 2015:777-782.
25.Tufts
C, Polsky
D, Volpp
KG,
et al. Characterizing tweet volume and content about common health conditions across Pennsylvania: retrospective analysis.
JMIR Public Health Surveill. 2018;4(4):e10834. doi:
10.2196/10834PubMedGoogle Scholar 26.Wang
Y, Callan
J, Zheng
B. Should we use the sample? analyzing datasets sampled from Twitter’s stream API.
ACM Trans Web. 2015;3(13):1-23. doi:
10.1145/2746366Google Scholar 32.Sarker
A, Gonzalez-Hernandez
G, Perrone
J. Towards automating location-specific opioid toxicosurveillance from Twitter via data science methods.
Stud Health Technol Inform. 2019;264:333-337. doi:
10.3233/SHTI190238PubMedGoogle Scholar 34.Mikolov
T, Sutskever
I, Chen
K, Corrado
G, Dean
J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (NIPS 2013). San Diego, CA: Neural Information Processing Systems Foundation Inc; 2013:1-9.
37.Hanson
CL, Burton
SH, Giraud-Carrier
C, West
JH, Barnes
MD, Hansen
B. Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students.
J Med Internet Res. 2013;15(4):e62. doi:
10.2196/jmir.2503PubMedGoogle Scholar 43.Katsuki
T, Mackey
TK, Cuomo
R. Establishing a link between prescription drug abuse and illicit online pharmacies: analysis of Twitter data.
J Med Internet Res. 2015;17(12):e280. doi:
10.2196/jmir.5144PubMedGoogle Scholar 47.Sharpe
JD, Hopkins
RS, Cook
RL, Striley
CW. Evaluating Google, Twitter, and Wikipedia as tools for influenza surveillance using bayesian change point analysis: a comparative analysis.
JMIR Public Health Surveill. 2016;2(2):e161. doi:
10.2196/publichealth.5901PubMedGoogle Scholar 48.Klein
A, Sarker
A, Rouhizadeh
M, O’Connor
K, Gonzalez
G. Detecting personal medication intake in Twitter: an annotated corpus and baseline classification system. In: Proceedings of the BioNLP2017Workshop. Vancouver, Canada: Association for Computational Linguistics; 2017:136-142.
49.Sarker
A, Belousov
M, Friedrichs
J,
et al. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task.
J Am Med Inform Assoc. 2018;25(10):1274-1283. doi:
10.1093/jamia/ocy114PubMedGoogle ScholarCrossref 50.Klein
AZ, Sarker
A, Weissenbacher
D, Gonzalez-Hernandez
G. Automatically detecting self-reported birth defect outcomes on Twitter for large-scale epidemiological research [published online October 22, 2018].
arXiv. doi:
10.1038/s41746-019-0170-5Google Scholar 51.Sahni
T, Chandak
C, Chedeti
NR, Singh
M. Efficient Twitter sentiment classification using subjective distant supervision. In:
2017 9th International Conference on Communication Systems and Networks (COMSNETS). Piscataway, NJ: IEEE; 2017:548-553. doi:
10.1109/COMSNETS.2017.7945451 52.Devlin
J, Chang
MW, Lee
K, Toutanova
K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019. Minneapolis, MN: Association for Computational Linguistics; 2019:4171-4186.
53.Sarker
A, Chandrashekar
P, Magge
A, Cai
H, Klein
A, Gonzalez
G. Discovering cohorts of pregnant women from social media for safety surveillance and analysis.
J Med Internet Res. 2017;19(10):e361. doi:
10.2196/jmir.8164PubMedGoogle Scholar