@article{jbp:/content/journals/10.1075/ijcl.17062.ben, author = "Bentum, Martijn and ten Bosch, Louis and van den Bosch, Antal and Ernestus, Mirjam", title = "Do speech registers differ in the predictability of words?", journal= "International Journal of Corpus Linguistics", year = "2019", volume = "24", number = "1", pages = "98-130", doi = "https://doi.org/10.1075/ijcl.17062.ben", url = "https://www.jbe-platform.com/content/journals/10.1075/ijcl.17062.ben", publisher = "John Benjamins", issn = "1384-6655", type = "Journal Article", keywords = "register analysis", keywords = "word predictability", keywords = "speech registers", keywords = "statistical language modelling", keywords = "text classification", abstract = "Abstract

Previous research has demonstrated that language use can vary depending on the context of situation. The present paper extends this finding by comparing word predictability differences between 14 speech registers ranging from highly informal conversations to read-aloud books. We trained 14 statistical language models to compute register-specific word predictability and trained a register classifier on the perplexity score vector of the language models. The classifier distinguishes perfectly between samples from all speech registers and this result generalizes to unseen materials. We show that differences in vocabulary and sentence length cannot explain the speech register classifier’s performance. The combined results show that speech registers differ in word predictability.", }