Analysis of POS frequencies by character and genre.
The Shrew text was annotated using the functionality of the cleanNLP
package for R (Arnold 2017). Tagging for speaker and genre at the word level was included as in previous steps of the analysis. Below is a preview of the data after part-of-speech tagging.
doc_id | sid | tid | token | token_with_ws | lemma | upos | xpos | tid_source | relation | linenumber | speaker | genre |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | I | I | -PRON- | PRON | PRP | 3 | nsubj | 1 | SLY | prose |
1 | 1 | 2 | ’ll | ’ll | will | VERB | MD | 3 | aux | 1 | SLY | prose |
1 | 1 | 3 | pheeze | pheeze | pheeze | VERB | VB | 0 | root | 1 | SLY | prose |
1 | 1 | 4 | you | you | -PRON- | PRON | PRP | 3 | dobj | 1 | SLY | prose |
1 | 1 | 6 | in | in | in | ADP | IN | 3 | prep | 1 | SLY | prose |
1 | 1 | 7 | faith | faith | faith | NOUN | NN | 6 | pobj | 1 | SLY | prose |
2 | 1 | 1 | A | A | a | DET | DT | 2 | det | 2 | HOSTESS | prose |
2 | 1 | 2 | pair | pair | pair | NOUN | NN | 0 | root | 2 | HOSTESS | prose |
2 | 1 | 3 | of | of | of | ADP | IN | 2 | prep | 2 | HOSTESS | prose |
2 | 1 | 4 | stocks | stocks | stock | NOUN | NNS | 3 | pobj | 2 | HOSTESS | prose |
2 | 1 | 6 | you | you | -PRON- | PRON | PRP | 7 | nsubj | 2 | HOSTESS | prose |
2 | 1 | 7 | rogue | rogue | rogue | VERB | VBP | 2 | appos | 2 | HOSTESS | prose |
3 | 1 | 1 | Y | Y | Y | PROPN | NNP | 3 | nsubj | 3 | SLY | prose |
3 | 1 | 3 | are | are | be | AUX | VBP | 9 | ccomp | 3 | SLY | prose |
3 | 1 | 4 | a | a | a | DET | DT | 5 | det | 3 | SLY | prose |
3 | 1 | 5 | baggage | baggage | baggage | NOUN | NN | 3 | attr | 3 | SLY | prose |
3 | 1 | 7 | the | the | the | DET | DT | 8 | det | 3 | SLY | prose |
3 | 1 | 8 | Slys | Slys | Slys | PROPN | NNP | 9 | nsubj | 3 | SLY | prose |
3 | 1 | 9 | are | are | be | AUX | VBP | 0 | root | 3 | SLY | prose |
3 | 1 | 10 | no | no | no | DET | DT | 11 | det | 3 | SLY | prose |
We’ll extract the 14 speakers with the most words from the data. They are:
speaker | n |
---|---|
PETRUCHIO | 4557 |
TRANIO | 2361 |
KATHARINA | 1832 |
HORTENSIO | 1773 |
GRUMIO | 1657 |
LUCENTIO | 1443 |
BAPTISTA | 1271 |
GREMIO | 1187 |
LORD | 1080 |
BIONDELLO | 813 |
SLY | 539 |
BIANCA | 518 |
PEDANT | 393 |
VINCENTIO | 341 |
Higher frequencies of verbs (and attendant parts of speech) indicate dynamic communication, social intelligence, action-focused modes of thought, relational psychology (i.e. relating to other characters) (Pennebaker et al. 2014; Pennebaker, Mehl, and Niederhoffer 2003; Biber 1991).
Higher frequencies of nouns (and attendant parts of speech) indicate conceptual thinking, declarative intelligence, epistemological interest, fact-oriented modes of thought, investigative/academic psychology.
Because of the high frequencies of tokens in the analysis of POS-tags, even small differences in proportions can be considered significant (Hinrichs, Smith, and Waibel 2010).
The data has been part-of-speech tagged in the background. The tagger we’re using assigns a set of 36 different tags.
[1] "ADD" "CC" "CD" "DT" "EX" "FW" "IN" "JJ" "JJR"
[10] "JJS" "MD" "NN" "NNP" "NNPS" "NNS" "PDT" "POS" "PRP"
[19] "PRP$" "RB" "RBR" "RBS" "RP" "TO" "UH" "VB" "VBD"
[28] "VBG" "VBN" "VBP" "VBZ" "WDT" "WP" "WP$" "WRB" "XX"
Thankfully, it also has a set of meta-categories, so I won’t need to define any myself They are:
[1] "ADJ" "ADP" "ADV" "AUX" "CCONJ" "DET" "INTJ" "NOUN"
[9] "NUM" "PART" "PRON" "PROPN" "SCONJ" "VERB" "X"
There are 15 of them.
We are interested in “verbal” vs. “nominal” style. These can be measured in the frequencies of the “VERB” and the “NOUN” tags, respectively, but I want to also include the attendant POS groups that co-vary with those two (Mair 1997; Hinrichs, Smith, and Waibel 2010):
So I’ll form two index groups:
Here are the relationships between verbal and nominal indices by the top-14 characters.
This graph does not communicate very clearly what we actually want to know, which is the ratio between frequency of tags in the noun group and those in the verb group. So let’s eliminate “other” and focus only on the verb and noun tags.
Finally, we can break up the noun:verb ratio for each speaker by genre.
For attribution, please cite this work as
Hinrichs (2020, Oct. 15). Genre and Character in The Taming of the Shrew: 4 - Part-of-Speech Distribution. Retrieved from https://titus-and-shrew.netlify.app
BibTeX citation
@misc{hinrichs2020shrew-4, author = {Hinrichs, Lars}, title = {Genre and Character in The Taming of the Shrew: 4 - Part-of-Speech Distribution}, url = {https://titus-and-shrew.netlify.app}, year = {2020} }