2 - Some Numbers

Descriptive statistics about the text’s dimensions by genre, speaker, and act.

Lars Hinrichs https://larshinrichs.site (The University of Texas at Austin)https://liberalarts.utexas.edu/english
10-17-2020

Descriptive Statistics

Everything we’ve done to the text so far, including the hand coding, is called text mining. By conducting some counts and visualizations here of the distributions in our variables of interest, we are moving into what data scientists call descriptive statistics. (And the next level of statistical complexity would be called predictive statistics, or modeling, or machine learning, or AI.)

Many digital text analysis projects are best served by careful descriptive statistics, and this is one of them.

Research Questions

Received from Doug Bruster via email.

  1. How many words of verse in this play?
  2. How many words of prose?
  3. How many of each does each character speak?
  4. What is the overall sentiment of the verse speeches?
  5. What is the overall sentiment of the prose speeches?
  6. Is there a discrepancy in the sentiment of the verse a certain character speaks as opposed to the sentiment of the prose he or she speaks (that is, a wider or narrower gap than the average difference of the two media)?
  7. Are verse and prose made up of different kinds of words (for example, are the words in one medium longer in terms of character count than the other)?
  8. Are the lexicons of verse and prose ordered differently? (That is, do the most frequent words differ from one to the other?)

Text Volume by Genre, Character, Act

Character

First, let’s see which character has how many words.

Number of words per character.

Figure 1: Number of words per character.

Genre

Number of words per character.

Figure 2: Number of words per character.

Genre and Character

Number of words per character and genre.

Figure 3: Number of words per character and genre.

Below is the same chart but showing only characters who use both prose and verse.

Genre and Character by Act

Let us focus on just the characters who use both genres. They are:

 [1] "BAPTISTA"  "BIANCA"    "BIONDELLO" "GREMIO"    "GRUMIO"   
 [6] "HORTENSIO" "KATHARINA" "LORD"      "LUCENTIO"  "PEDANT"   
[11] "PETRUCHIO" "SLY"       "TAILOR"    "TRANIO"    "VINCENTIO"

Let’s see how they mix across the five acts.

Speech volume across acts by genre.

Figure 4: Speech volume across acts by genre.

Citation

For attribution, please cite this work as

Hinrichs (2020, Oct. 17). Genre and Character in The Taming of the Shrew: 2 - Some Numbers. Retrieved from https://titus-and-shrew.netlify.app

BibTeX citation

@misc{hinrichs2020shrew-2,
  author = {Hinrichs, Lars},
  title = {Genre and Character in The Taming of the Shrew: 2 - Some Numbers},
  url = {https://titus-and-shrew.netlify.app},
  year = {2020}
}