Use of Deep Learning to Analyze Social Media Discussions About the Human Papillomavirus Vaccine

Key Points Question Can public perceptions of the human papillomavirus (HPV) vaccine be accessed from the perspective of behavior change theories by mining social media data with machine learning algorithms? Findings This cohort study included 1 431 463 English-language posts about the HPV vaccine from 486 116 unique usernames from a social media platform. An increase in HPV vaccine–related discussions was found, and the results suggest temporal and geographic variations in public perceptions of the HPV vaccine. Meaning The findings of this study suggest that social media and machine learning algorithms can serve as a complementary approach to inform public health surveillance and understanding and help to design targeted educational and communication programs that increase HPV vaccine acceptance.


Term
Explanation Artificial Intelligence AI is a wide branch of computer science that builds intelligent machines capable of performing tasks that typically require human intelligence. Natural Language Processing (NLP) NLP is a field of AI that understands natural language through computers.
Gold Standard Corpus A robust dataset used for training and evaluation of computational algorithms that have been manually curated by domain experts. Machine Learning (ML) ML is an application of AI that provides computers the ability to learn and improve through experience without being explicitly programmed.

Feature Engineering
Feature engineering is the process of extracting numeric features from the original data that can be directly processed by ML algorithms. Deep Learning (DL) DL is a subfield of ML that is based on artificial neural network algorithms.

Recurrent Neural Network (RNN)
RNN is a DL model that contains loops within the network which allows information to be stored. Long Short-Term Memory (LSTM) LSTM is a variation of RNN that helps to alleviate the vanishing gradient problem by using the "gate" mechanism to retain "memories" of prior time steps.

Word Embedding
A word embedding is a learned word representation that maps the words to high-dimensional vectors, where similar words have similar encodings.

Softmax Layer
Softmax layer is typically the final output layer in a deep learning model that performs multiclass classification. Softmax layer uses softmax function to generate a probability distribution for all classes.

Attention Layer
An attention layer is an effective component of a deep neural network that is selectively concentrating on a few relevant things in a sequence output.
fastText is a more recent method of word embedding, which is based on a skip-gram model. However, contrary to word2vec, where the morphology of words is ignored, each word in fastText is represented as a bag of character n-grams. A word vector representation is associated with each character n-grams.
Twitter word embedding was trained by applying the above three models to the unlabeled HPV-related Twitter corpus (~1.4 million tweets), which were termed W2V HPV, GloVe HPV, and FT HPV. For all three models, the window size was set at 5, the maximum iteration at 20, and dimension size at 200. The use of these Twitter word embeddings was evaluated on a recurrent neural network with attention mechanism (Att-RNN). 9 For comparison purposes, the use of pre-trained 200-dimension GloVe Twitter embedding (trained 2 billion tweets from the general domain, which we term GloVe General) and the use of random 200-dimension embedding were also evaluated.
The performance of the different word-embedding techniques can be seen in eMethods In addition to Att-RNN, to demonstrate the superiority of the proposed model, we further evaluated several competitive traditional machine learning algorithms and two deep learningbased algorithms (i.e. Att-ELMo and BERT) as comparisons.
Several classic machine learning algorithms (e.g., support vector machines, logistic regression, random forest) were tested and extremely randomized trees were chosen (ERT) 15 as the baseline algorithm due to its better performance on most of the tasks. Two types of features were evaluated: (1) mean-embedding -all of the tokens were mapped to high-dimensional vectors using pre-trained word embedding (FT HPV embedding was used in this study) and took the averaged word vectors for all words in each tweet as the feature (which was termed meanemb) and (2) term frequency-inverse document frequency (TF-IDF) -TF-IDF is a numerical statistic that is intended to reflect how important a word is to a document in a corpus. 16 Att-ELMo is an attentive sequence model based on the Embeddings from Language Models (ELMo). 17 Traditional word embedding methods assign a static high dimensional vector to a word, regardless of its context. However, a word could have multiple context-dependent meanings. ELMo is a deep contextualized word embedding method that can look at the entire context before assigning each word its embedding vector. Att-ELMo first adopts the pre-trained ELMo (which was loaded from https://tfhub.dev/google/elmo/2) to map each word in the tweet to high-dimensional vectors. Then, similar to Att-RNN, word vectors are then fed to a bidirectional RNN, followed by the attention mechanism. A Softmax layer serves as the output layer for classification.
BERT stands for Bidirectional Encoder Representations from Transformers. BERT is a new language representation model based on Transformer architecture. 18 Transformer relies entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs (e.g. LSTM). 19 Contrary to recurrent models, Transformer allows for significantly more parallelization. BERT achieved state-of-the-art performance in many natural language processing tasks. 18 A pre-trained BERT model can be fined tuned with just one additional layer to other tasks. The pre-trained BERT model (BERT-Large, Uncased) was loaded and fine-tuned in the present study's Twitter text classification tasks.
The comparison of different classification algorithms can be seen in eMethods Table 4  BERT, a recent breakthrough in NLP, has advanced state-of-the-art performance in multiple general domain NLP tasks. 18