3 - Sentiment Analysis

Application of two-way sentiment analysis based on the Bing Liu lexicon.

Lars Hinrichs https://larshinrichs.site (The University of Texas at Austin)https://liberalarts.utexas.edu/english
10-16-2020
Sentiments!

Figure 1: Sentiments!

Tagging the data

We continue working with the previous version of the data – with one word per row, with act, speaker, and genre marked – and now add: (a) stopword removal and (b) neg./pos. sentiment tagging.

We must concede that, although we have a version of the text here that is orthographically modernized, it’s still written in an old form of English with a number of words that are not included in a modern-day sentiment lexicon such as Bing Liu’s.

We can output a version of the text that has had the stopwords removed, and then shows us the words that were not mapped to a term in the sentiment dictionary (in other words, they were found neither in the stopwords list nor in the sentiment lists).

 [1] "richard"      "conqueror"    "paucas"       "pallabris"   
 [5] "world"        "slide"        "sessa"        "pay"         
 [9] "glasses"      "burst"        "denier"       "jeronimy"    
[13] "thy"          "bed"          "thee"         "fetch"       
[17] "thirdborough" "fourth"       "fift"         "borough"     
[21] "answer"       "law"          "budge"        "inch"        
[25] "boy"          "huntsman"    

It’s not too terrible. Well – thee, thy etc. should have been removed as stopwords, but it’s alright. We can apply the method and trust it will tell us something informative Here, then, is the top of the list of words that were sentiment-tagged.

Table 1: Top of the sentiment-tagged text object.
speaker word sentiment linenumber act genre
SLY faith positive 1 1 prose
HOSTESS rogue negative 2 1 prose
SLY saint positive 9 1 prose
SLY cold negative 10 1 prose
SLY warm positive 10 1 prose
HOSTESS remedy positive 11 1 prose
SLY kindly positive 15 1 prose
LORD tender positive 16 1 verse
LORD poor negative 17 1 verse
LORD hedge negative 20 1 verse
LORD fault negative 20 1 verse
LORD lose negative 21 1 verse

Sentiment by Genre and Act

By Act

First, an analysis that foregrounds textual structure more than character. The data is aggregated accordingly…

Table 2: Sentiment scores per act.
act sentiment n total_tagged prop
1 negative 194 416 0.466
1 positive 222 416 0.534
2 negative 108 214 0.505
2 positive 106 214 0.495
3 negative 90 159 0.566
3 positive 69 159 0.434

…and then plotted.

Sentiment per act.

Figure 2: Sentiment per act.

By Act and Genre

First the data is converted…

Table 3: Sentiment scores per act and genre.
act genre sentiment n total_tagged prop
1 prose negative 36 71 0.507
1 prose positive 35 71 0.493
1 verse negative 158 345 0.458
1 verse positive 187 345 0.542
2 prose negative 4 10 0.400
2 prose positive 6 10 0.600
2 verse negative 104 204 0.510
2 verse positive 100 204 0.490

…and then plotted.

Sentiment per act and genre. Token counts are high enough for each act so the Ns aren't reported (but cp. other plots below).

Figure 3: Sentiment per act and genre. Token counts are high enough for each act so the Ns aren’t reported (but cp. other plots below).

Sentiment by Character

We can produce a value for the proportion of neg. and pos. terms out of the total number of terms tagged for each character.

Table 4: Sentiment scores per character
speaker sentiment n total_tagged prop
BAPTISTA negative 41 79 0.519
BAPTISTA positive 38 79 0.481
BIANCA negative 19 36 0.528
BIANCA positive 17 36 0.472
BIONDELLO negative 21 43 0.488
BIONDELLO positive 22 43 0.512

Let’s visualize these. Using proportions (or percentages) means that all the characters are displayed on the same scale, even though some of them have very, very few words. When a character has only a couple of words scored, there’s a bit of significance issue. Therefore I’ll include the number of words scored per character in the visualization: it will be proportional to the width of the bars for each character.

Proportion of sentiment scores per character. Label in white shows number of items scored per character as an indicator of reliability

Figure 4: Proportion of sentiment scores per character. Label in white shows number of items scored per character as an indicator of reliability

Sentiment by Character and Genre

For this analysis, we break down the aggregate sentiment scores by character and genre. That is to say, we’ll end up with a dataset that contains (up to) four rows per character: it splits two ways for genre and then two ways for sentiment.

Table 5: Sentiment scores aggregated by character and genre.
speaker genre sentiment n total_tagged prop
BAPTISTA prose negative 5 7 0.714
BAPTISTA prose positive 2 7 0.286
BAPTISTA verse negative 36 72 0.500
BAPTISTA verse positive 36 72 0.500
BIANCA prose negative 1 2 0.500
BIANCA prose positive 1 2 0.500
BIANCA verse negative 18 34 0.529
BIANCA verse positive 16 34 0.471
BIONDELLO prose negative 18 35 0.514
BIONDELLO prose positive 17 35 0.486
BIONDELLO verse negative 3 8 0.375
BIONDELLO verse positive 5 8 0.625

Let us visualize these new numbers. A caveat: when there are too few items in a category (anything under about 4 has no statistical meaning), results must be taken with more than a grain of salt. Findings for the characters with more words can be trusted, however.

Crossing sentiment scores several ways.

Figure 5: Crossing sentiment scores several ways.

Citation

For attribution, please cite this work as

Hinrichs (2020, Oct. 16). Genre and Character in The Taming of the Shrew: 3 - Sentiment Analysis. Retrieved from https://titus-and-shrew.netlify.app

BibTeX citation

@misc{hinrichs2020shrew-3,
  author = {Hinrichs, Lars},
  title = {Genre and Character in The Taming of the Shrew: 3 - Sentiment Analysis},
  url = {https://titus-and-shrew.netlify.app},
  year = {2020}
}