Sentiments and Emotion in Doppelgänger Tweet Text

InfoEpi Lab

Building the Term Document Matrix

A Term Document Matrix (TDM) is a mathematical matrix that graphically represents the frequency of terms that occur in a collection of documents. In this matrix, rows correspond to terms and columns correspond to documents, or vice versa, depending on the structure chosen. Each cell in the matrix indicates the frequency of a term in a particular document.

Visualizing the Most Common Words

Generate a Bar Chart

Generate a Word Cloud

Topic Modeling

Determine the ideal number of and identify topics.

fit models... done.
calculate metrics:
  CaoJuan2009... done.
  Arun2010... done.
  Deveaud2014... done.

The CaoJuan2009 and Arun2010 metrics suggest a small number of topics with 3 and 5 topics respectively being points of interest.
Deveaud2014 suggests even fewer topics (2 topics) might be optimal.

     Topic 1   Topic 2    Topic 3  Topic 4   
[1,] "corrupt" "macron"   "ukrain" "ukrain"  
[2,] "time"    "must"     "need"   "situat"  
[3,] "countri" "ukrain"   "time"   "seem"    
[4,] "govern"  "unitedst" "must"   "unitedst"
[5,] "must"    "see"      "chang"  "macron"

Sentiment Analysis in R

Sentiments in texts can be classified as positive, neutral, or negative. They can also be quantified using a numerical scale to express the intensity of the sentiment.

Sentiment Analysis using Syuzhet Method

Extract sentiment scores and view initial elements and summaries.

Code

# Calculate sentiments using the Syuzhet method
syuzhet_vector <- get_sentiment(text, method="syuzhet")
# Display first few entries of the sentiment scores
head(syuzhet_vector)

[1]  1.30  0.00 -0.60  0.65 -1.00 -0.25

Code

# Generate summary statistics for the Syuzhet sentiment scores
summary(syuzhet_vector)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-4.5000 -0.7500 -0.1250 -0.1324  0.5000  3.9000

Sentiment Analysis using Bing Method

Apply the Bing method, inspect the first few entries, and summarize.

Code

# Calculate sentiments using the Bing method
bing_vector <- get_sentiment(text, method="bing")
# Display first few entries
head(bing_vector)

[1]  2  0 -1  0  0 -1

Code

# Summary statistics
summary(bing_vector)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-4.0000 -1.0000  0.0000 -0.3594  0.0000  3.0000

Sentiment Analysis using AFINN Method

Analysis with AFINN, examining initial outputs and summary statistics.

Code

# Calculate sentiments using the AFINN method
afinn_vector <- get_sentiment(text, method="afinn")
# Display first few entries
head(afinn_vector)

[1]  3  0 -1  2 -3 -2

Code

# Summary statistics
summary(afinn_vector)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-11.0000  -2.0000   0.0000  -0.6894   1.0000  10.0000

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    0   -1    1   -1   -1
[2,]    1    0   -1    0    0   -1
[3,]    1    0   -1    1   -1   -1

Bing Method: This method utilizes a binary scale where:

-1 represents negative sentiment
+1 denotes positive sentiment

AFINN Method: This approach employs an integer scale ranging from:

-5 (most negative)
+5 (most positive)

Syuzhet Method: This technique employs the NRC emotion lexicon, which associates words with eight different emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). It provides a complex and nuanced understanding of emotional undertones in text data.

To effectively compare the sentiment analysis results from different methods, it’s important to normalize their outputs to a common scale because they use different rating systems. A practical approach in R for this standardization is to use the sign function.

Converts all positive numbers to 1
Converts all negative numbers to -1
Keeps zero values unchanged as 0 This simplification allows for direct comparison across different sentiment analysis methods.

Emotion Analysis

The NRC Word-Emotion Association Lexicon (EmoLex) facilitates the classification of words according to their association with various emotions and sentiments. EmoLex categorizes English words into eight distinct emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). Further details on EmoLex can be found on Saif Mohammad’s website.

The get_nrc_sentiments function generates a data frame where each row corresponds to a specific sentence from the analyzed text. This data frame has ten columns. Each column represents one of the eight emotions or one of the two sentiment valences.

   anger anticipation disgust fear joy sadness surprise trust negative positive
1      0            0       0    0   0       0        0     0        0        2
2      0            1       0    0   0       0        0     1        1        0
3      0            1       0    0   1       0        0     1        1        1
4      1            2       0    1   1       1        1     2        1        3
5      0            0       0    0   0       0        0     1        0        0
6      0            1       0    1   0       1        0     0        1        1
7      0            1       0    2   0       2        0     1        2        1
8      1            0       0    1   1       1        0     1        1        2
9      0            0       0    1   0       1        0     1        2        1
10     2            0       1    1   1       1        1     1        1        1

The next step is to create two plots charts to help visually analyze the emotions in the headline text. This will tally the total number of instances of words in the text associated with each of the eight emotions.

To better understand the main emotions in the headlines, we can look at these numbers as parts of the whole, which shows how much of the important words were categorized under each sentiment.

Citation

BibTeX citation:

@article{infoepi_lab2024,
  author = {{InfoEpi Lab}},
  publisher = {Information Epidemiology Lab},
  title = {Sentiments and {Emotion} in {Doppelgänger} {Tweet} {Text}},
  journal = {InfoEpi Lab},
  date = {2024-05-08},
  url = {https://infoepi.org/posts/2024/05/08-doppelganger-tweet-text.html},
  langid = {en}
}

For attribution, please cite this work as:

InfoEpi Lab. 2024. “Sentiments and Emotion in Doppelgänger Tweet Text.” InfoEpi Lab, May. https://infoepi.org/posts/2024/05/08-doppelganger-tweet-text.html.