Linguistic Analysis of Stage Directions in Russian Drama from the 18th to the 20th Century, 3rd year term paper at NRU HSE, Moscow, Russia
Stage directions, quite literally, don’t count.
In: Hardin L. Aasand (ed.): Stage Directions in Hamlet. New essays and new directions. Madison et al. 2003, p. 226.
This is a repo with the code to my 3rd year coursework. Its title is Linguistic Analysis of Stage Directions in Russian Drama from the 18th to the 20th Century, so it’s going to be all stage directions and all linguistic :)
Check out my slides for EADH 2018 conference here; basically, they cover everything I did for this course paper.
Perform some neat corpus analysis on the Russian Drama Corpus.
A great result would be the classification of stage directions according to the TEI-5 markup standard. According to it, stage directions have 9 types:
File/folder | What’s inside |
---|---|
csv/ | Comma-separated files with datasets |
figures/ | Figures from plot-plays.ipynb |
requirements.txt | List of packages required to run the notebooks |
directions-basic.ipynb | Extracting some basic information about plays |
means-merged-features.ipynb | Mean POS counts, merging with another dataset |
plot-plays.ipynb | Drawing different plots visualising the data we got |
classification.ipynb | Classifying the directions into TEI-P5 types |
frequent-pos.ipynb | Most frequent parts of speech in the corpus |
All the dependencies are listed in requirements.txt
. As a sidenote: the majority of the packages are shipped with Anaconda. If you have it installed, you’ll only need to install nltk
by yourself, and also to download NLTK data after that. In Python, this should be as follows:
import nltk
nltk.download()
I’m using RusDraCor. It can be explored on its site, and it’s also possible to download it from its Github repository.