Volume 21, Issue 1
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes


Lectometry is a corpus-based methodology that explores how multiple language-external dimensions shape language usage in an aggregate perspective. The paper combines this methodology with Semantic Vector Space modeling to investigate lexical variability in written Standard English, as sampled in the original Brown family of corpora (Brown, LOB, Frown and F-LOB). Based on a joint analysis of 303 lexical variables, which are semi-automatically extracted by means of a SVS, we find that lexical variation in the Brown family is systematically related to three lectal dimensions: discourse type (informative versus imaginative), standard variety (British English versus American English), and time period (1960s versus 1990s). It turns out that most lexical variables are sensitive to at least one of these three language-external dimensions, yet not every dimension has dedicated lexical variables: in particular, distinctive lexical variables for the real time dimension fail to emerge.


Article metrics loading...

Loading full text...

Full text loading...


  1. Biber, D
    (1988) Variation acros Speech and Writing. Cambridge, UK: Cambridge University Press. doi: 10.1017/CBO9780511621024
    https://doi.org/10.1017/CBO9780511621024 [Google Scholar]
  2. (1989) A typology of English texts. Linguistics, 27(1), 3–42. doi: 10.1515/ling.1989.27.1.3
    https://doi.org/10.1515/ling.1989.27.1.3 [Google Scholar]
  3. Bickel, B
    (2007) Typology in the 21st century: Major current developments. Linguistic Typology, 11(1), 239–251. doi: 10.1515/LINGTY.2007.018
    https://doi.org/10.1515/LINGTY.2007.018 [Google Scholar]
  4. BNC Consortium
    (2007) The British National Corpus (version 3, BNC xml edition). Distributed by Oxford University in Computing Services on behalf of the BNC Consortium.
    [Google Scholar]
  5. Borin, L. , & Saxena, A
    (2013) Approaches to Measuring Linguistic Differences. Berlin, Germany: Mouton de Gruyter. doi: 10.1515/9783110305258
    https://doi.org/10.1515/9783110305258 [Google Scholar]
  6. Church, K.W. , & Hanks, P
    (1990) Word association, mutual information and lexicography. Computational Linguistics, 16(1), 22–29.
    [Google Scholar]
  7. Cysouw, M
    (2005) Quantitative methods in typology. In G. Altmann , R. Köhler , & R. Piotrowski (Eds.), Quantitative Linguistics: An International Handbook (pp.554–578) Berlin, Germany: Mouton de Gruyter.
    [Google Scholar]
  8. de Leeuw, J. , & Mair, P
    (2009) Multidimensional scaling using Majorization: SMACOF in R. Journal of Statistical Software, 31(3), 1–30.
    [Google Scholar]
  9. Delaere, I. , De Sutter, G. , & Plevoets, K
    (2012) Is translated language more standardized than non-translated language? Using profile-based correspondence analysis for measuring linguistic distances between language varieties. Target: An International Journal of Translation Studies, 24(2), 203–224.
    [Google Scholar]
  10. Dinu, G. , Thater, S. , Laue, S
    (2012) A comparison of models of word meaning in context. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp.611–615). Montréal, Canada: Association for Computational Linguistics.
    [Google Scholar]
  11. Dunning, T
    (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
    [Google Scholar]
  12. Firth, J
    (1957) A synopsis of linguistic theory 1930-1955. In J.R. Firth (Ed.), Studies in Linguistic Analysis (pp. 1–32). Oxford, UK: Philological Society.
    [Google Scholar]
  13. Geeraerts, D
    (2010) Theories of Lexical Semantics. Oxford, UK: Oxford University Press.
    [Google Scholar]
  14. Geeraerts, D. , Grondelaers, S. , & Bakema, P
    (1994) The Structure of Lexical Variation. Meaning, Naming, and Context. Berlin, Germany: Mouton de Gruyter. doi: 10.1515/9783110873061
    https://doi.org/10.1515/9783110873061 [Google Scholar]
  15. Geeraerts, D. , Grondelaers, S. , & Speelman, D
    (1999) Convergentie en divergentie in de Nederlandse woordenschat. Een onderzoek naar kleding- en voetbaltermen. Amsterdam, Netherlands: Meertens Instituut.
    [Google Scholar]
  16. Goebl, H
    (1984) Dialektometrische Studien: Anhand italoromanischer, raetoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Tübingen, Germany: Max Niemeyer.
    [Google Scholar]
  17. Grieve, J
    (2007) Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270. doi: 10.1093/llc/fqm020
    https://doi.org/10.1093/llc/fqm020 [Google Scholar]
  18. Grieve, J. , Speelman, D. , & Geeraerts, D
    (2011) A statistical method for the identification and aggregation of regional linguistic variation. Language Variation and Change, 23(2), 193–221. doi: 10.1017/S095439451100007X
    https://doi.org/10.1017/S095439451100007X [Google Scholar]
  19. Heeringa, W
    (2004)  Measuring Dialect Pronunciation Differences using Levenshtein Distance. (Unpublished doctoral dissertation). Groningen, Netherlands: Rijksuniversiteit Groningen.
  20. Heylen, K. , & Ruette, T
    (2013) Degrees of semantic control in measuring aggregated lexical distances. In L. Borin & A. Saxena (Eds.), Approaches to Measuring Linguistic Differences (pp. 353–374). Berlin, Germany: Mouton de Gruyter.
    [Google Scholar]
  21. Heylen, K. , Speelman, D. , & Geeraerts, D
    (2012) Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch synsets. Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH (pp.16–26). Avignon, France: Association for Computational Linguistics.
    [Google Scholar]
  22. Hinrichs, L. , Smith, N. , & Waibel, B
    (2010) A manual of information for the part-of-speech-tagged ‘Brown’ corpora. ICAME Journal, 34, 189–230.
    [Google Scholar]
  23. Horan, C
    (1969) Multidimensional Scaling: Combining observations when individuals have different perceptual structures. Psychometrica, 34(2), 139–165. doi: 10.1007/BF02289341
    https://doi.org/10.1007/BF02289341 [Google Scholar]
  24. Hudson, R
    (1996) Sociolinguistics. Cambridge, UK: Cambridge University Press. doi: 10.1017/CBO9781139166843
    https://doi.org/10.1017/CBO9781139166843 [Google Scholar]
  25. Labov, W
    (1969) Contraction, deletion, and inherent variability of the English Copula. Language45(4), 715–62. doi: 10.2307/412333
    https://doi.org/10.2307/412333 [Google Scholar]
  26. (1972) Sociolinguistic Patterns. Oxford, UK: Blackwell.
    [Google Scholar]
  27. Labov, W. , Ash, S. , & Boberg, C
    (2006) The Atlas of North American English. Phonetics, Phonology and Sound Change. Berlin, Germany: Mouton de Gruyter.
    [Google Scholar]
  28. Lavandera, B
    (1978) Where does the sociolinguistic variable stop?Language in Society,7(2), 171–183. doi: 10.1017/S0047404500005510
    https://doi.org/10.1017/S0047404500005510 [Google Scholar]
  29. Navigli, R
    (2012) A quick tour of word sense disambiguation, induction and related approaches. In Proceedings of the 38th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM) (pp.115–129). Heidelberg, Germany: Springer-Verlag.
    [Google Scholar]
  30. Nerbonne, J
    (2009) Data-driven dialectology. Language and Linguistics Compass3(1), 175–198. doi: 10.1111/j.1749‑818X.2008.00114.x
    https://doi.org/10.1111/j.1749-818X.2008.00114.x [Google Scholar]
  31. Nerbonne, J. , & Kretzschmar, W
    (2003) Introducing computational techniques in dialectometry. Computers and the Humanities, 37(3), 245–255. doi: 10.1023/A:1025064105053
    https://doi.org/10.1023/A:1025064105053 [Google Scholar]
  32. Pantel, P
    (2003)  Clustering by committee. (Unpublished doctoral dissertation). Alberta, Canada: University of Alberta.
  33. Peirsman, Y
    (2010)  Crossing corpora. (Unpublished doctoral dissertation). Leuven, Belgium: University of Leuven.
  34. Peirsman, Y. , Geeraerts, D. , & Speelman, D
    (2015) The corpus-based identification of cross-lectal synonyms in pluricentric languages. International Journal of Corpus Linguistics, 20(1), 54–80. doi: 10.1075/ijcl.20.1.03pei
    https://doi.org/10.1075/ijcl.20.1.03pei [Google Scholar]
  35. Plevoets, K. , Speelman, D. , & Geeraerts, D
    (2008) The distribution of T/V pronouns in Netherlandic and Belgian Dutch. In K. Schneider & A. Barron (Eds.), Variational Pragmatics: A Focus on Regional Varieties in Pluricentric Languages (pp. 181–210). Amsterdam, Netherlands: John Benjamins Publishing Company. doi: 10.1075/pbns.178.09ple
    https://doi.org/10.1075/pbns.178.09ple [Google Scholar]
  36. Quine, W.V.O
    (1951) Two dogmas of empiricism. The Philosophical Review, 60, 20–43. doi: 10.2307/2181906
    https://doi.org/10.2307/2181906 [Google Scholar]
  37. R Core Team
    (2012) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
    [Google Scholar]
  38. Reppen, R. , Ide, N. , & Suderman, K
    (2005) American National Corpus (ANC). Philadelphia, PA: Linguistic Data Consortium.
    [Google Scholar]
  39. Ruette, T
    (2012)  Aggregating Lexical Variation: Towards large-scale lexical lectometry. (Unpublished doctoral dissertation). Leuven, Belgium: University of Leuven.
  40. Ruette, T. , Geeraerts, D. , Peirsman, Y. , & Speelman, D
    (2014) Semantic weighting mechanisms in scalable lexical sociolectometry. In B. Szmrecsanyi & B. Wälchli (Eds.), Aggregating Dialectology and Typology: Linguistic Variation in Text and Speech, within and across Languages (205–230). Berlin, Germany: Mouton de Gruyter.
    [Google Scholar]
  41. Ruette, T. , & Speelman, D
    (2014) Transparent aggregation of variables with individual differences scaling. Literary and Linguistic Computing, 29(1), 89–106. doi: 10.1093/llc/fqt011
    https://doi.org/10.1093/llc/fqt011 [Google Scholar]
  42. Schler, J. , Koppel, M. , Argamon, S. , & Pennebaker, J
    (2006) Effects of age and gender on blogging. In Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs . Palo Alto, California.
    [Google Scholar]
  43. Schneider, E
    (1988) Qualitative vs. quantitative methods of area delimitation in dialectology: A comparison based on lexical data from Georgia and Alabama. Journal of English Linguistics21(1), 175–212.
    [Google Scholar]
  44. Seguy, J
    (1971) La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane35, 335–357.
    [Google Scholar]
  45. Sinclair, J
    (1991) Corpus, Concordance, Collocations. Oxford, UK: Oxford University Press.
    [Google Scholar]
  46. (2004) Trust the Text: Language, Corpus and Discourse. London: Routledge.
    [Google Scholar]
  47. Speelman, D. , Grondelaers, S. , & Geeraerts, D
    (2003) Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities, 37, 317–337. doi: 10.1023/A:1025019216574
    https://doi.org/10.1023/A:1025019216574 [Google Scholar]
  48. Stubbs, M
    (2002) Two quantitative methods of studying phraseology in English. International Journal of Corpus Linguistics, 7(2), 215–244. doi: 10.1075/ijcl.7.2.04stu
    https://doi.org/10.1075/ijcl.7.2.04stu [Google Scholar]
  49. Szmrecsanyi, B
    (2011) Corpus-based dialectometry: A methodological sketch. Corpora, 6(1), 45–76. doi: 10.3366/cor.2011.0004
    https://doi.org/10.3366/cor.2011.0004 [Google Scholar]
  50. (2013) Grammatical Variation in British English Dialects: A Study in Corpus-Based Dialectometry. Cambridge, UK: Cambridge University Press.
    [Google Scholar]
  51. Takane, Y. , Young, F. , & de Leeuw, J
    (1977) Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika, 42(1), 7–67. doi: 10.1007/BF02293745
    https://doi.org/10.1007/BF02293745 [Google Scholar]
  52. Turney, P. , & Pantel, P
    (2010) From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.
    [Google Scholar]
  53. Wälchli, B. , & Szmrecsanyi, B
    (2014) Introduction: The text-feature-aggregation pipeline in variation studies. In B. Szmrecsanyi & B. Wälchli (Eds), Aggregating Dialectology, Typology, and Register Analysis: Linguistic Variation in Text and Speech (1–25). Berlin, Germany: Mouton de Gruyter. doi: 10.1515/9783110317558.1
    https://doi.org/10.1515/9783110317558.1 [Google Scholar]
  54. Wieling, M. , & Nerbonne, J
    (2011) Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features. Computer Speech and Language, 25(3), 700–715. doi: 10.1016/j.csl.2010.05.004
    https://doi.org/10.1016/j.csl.2010.05.004 [Google Scholar]
  55. Wieling, M. , Nerbonne, J. , & Baayen, H
    (2011) Quantitative social dialectology: Explaining linguistic variation geographically and socially. PLoS ONE, 6(9), e23613. doi: 10.1371/journal.pone.0023613
    https://doi.org/10.1371/journal.pone.0023613 [Google Scholar]
  56. Woolhiser, C
    (2005) Political borders and dialect divergence/convergence in Europe. In P. Auer & F. Kerswill (Eds.), Dialect Change. Convergence and Divergence in European Languages (pp. 236–262). Cambridge, UK: Cambridge University Press.
    [Google Scholar]
  57. Zauner, A
    (1902)  Die romanischen Namen der Körperteile: Eine onomasiologische Studie. (Unpublished doctoral dissertation). Erlangen, Germany: Universität Erlangen.
  • Article Type: Research Article
Keyword(s): aggregation; lectometry; lexis; Semantic Vector Space models; Standard English
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error