1887
Volume 47, Issue 4
  • ISSN 0378-4177
  • E-ISSN: 1569-9978

Abstract

Abstract

There is high hope that corpus-based approaches to language complexity will contribute to explaining linguistic diversity. Several complexity indices have consequently been proposed to compare different aspects among languages, especially in phonology and morphology. However, their robustness against changes in corpus size and content hasn’t been systematically assessed, thus impeding comparability between studies. Here, we systematically test the robustness of four complexity indices estimated from raw texts and either routinely utilized in crosslinguistic studies (Type-Token Ratio and word-level Entropy) or more recently proposed (Word Information Density and Lexical Diversity). Our results on 47 languages strongly suggest that traditional indices are more prone to fluctuation than the newer ones. Additionally, we confirm with Word Information Density the existence of a cross-linguistic trade-off between word-internal and across-word distributions of information. Finally, we implement a proof of concept suggesting that modern deep-learning language models can improve the comparability across languages with non-parallel datasets.

Available under the CC BY 4.0 license.
Loading

Article metrics loading...

/content/journals/10.1075/sl.22034.oh
2022-12-20
2024-12-02
Loading full text...

Full text loading...

/deliver/fulltext/sl.22034.oh.html?itemId=/content/journals/10.1075/sl.22034.oh&mimeType=html&fmt=ahah

References

  1. Ackerman, Farrell & Robert Malouf
    2013 Morphological organization: The low conditional entropy conjecture. Language89(3). 429–464. 10.1353/lan.2013.0054
    https://doi.org/10.1353/lan.2013.0054 [Google Scholar]
  2. Aranovich, Raúl
    2013 Transitivity and polysynthesis in Fijian. Language89(3). 465–500. 10.1353/lan.2013.0038
    https://doi.org/10.1353/lan.2013.0038 [Google Scholar]
  3. Arkadiev, Peter & Francesco Gardani
    (eds.) 2020Introduction: The complexities of morphology. Oxford: Oxford University Press. 10.1093/oso/9780198861287.001.0001
    https://doi.org/10.1093/oso/9780198861287.001.0001 [Google Scholar]
  4. Artetxe, Mikel & Holger Schwenk
    2019 Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics71. 597–610. 10.1162/tacl_a_00288
    https://doi.org/10.1162/tacl_a_00288 [Google Scholar]
  5. Baerman, Matthew, Dunstan Brown & Greville G. Corbett
    (eds.) 2015Understanding and measuring morphological complexity. Oxford: Oxford University Press. 10.1093/acprof:oso/9780198723769.001.0001
    https://doi.org/10.1093/acprof:oso/9780198723769.001.0001 [Google Scholar]
  6. Barth, Danielle & Nicolas Evans
    (eds.) 2017The Social Cognition Parallax Corpus (SCOPIC) (Language documentation and conservation special publication no. 12). Honolulu: University of Hawai’i Press.
    [Google Scholar]
  7. Bentz, Christian & Dimitrios Alikaniotis
    2016 The word entropy of natural languages. arXiv preprint arXiv:1606.06996. Available at: (last access2 December 2022). CitetononCRdoi:10.48550/arXiv.1606.06996
    https://doi.org/Cite to nonCR doi: 10.48550/arXiv.1606.06996 [Google Scholar]
  8. Bentz, Christian, Tatyana Ruzsics, Alexander Koplenig & Tanja Samardžić
    2016 A comparison between morphological complexity measures: Typological data vs. language corpora. InDominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François & Philippe Blache (eds.), Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 142–153. Osaka, Japan, December, 2016. As a part of COLING 2016. 26th International Conference on Computational Linguistics.
    [Google Scholar]
  9. Bentz, Christian, Ximena Gutierrez-Vasques, Olga Sozinova & Tanja Samardžić
    2022 Complexity trade-Offs and equi-complexity in natural languages: A meta-analysis. Linguistics Vanguard. 10.1515/lingvan‑2021‑0054
    https://doi.org/10.1515/lingvan-2021-0054 [Google Scholar]
  10. Bickel, Balthasar & Johanna Nichols
    2013 Chapter 22: Inflectional synthesis of the verb, InMatthew S. Dryer & Martin Haspelmath (eds.). The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Available at: wals.info/chapter/22 (last access2 December 2022).
    [Google Scholar]
  11. Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga & John B. Lowe
    2022The AUTOTYP database (v1.0.0) [Data set]. Zenodo. Available at: https://zenodo.org/record/5931509#.Y24TeuxBxb8 (last access2 December 2022).
    [Google Scholar]
  12. Bisang, Walter
    2014 Overt and hidden complexity–Two types of complexity and their implications. Poznan Studies in Contemporary Linguistics50(2). 127–143. 10.1515/psicl‑2014‑0009
    https://doi.org/10.1515/psicl-2014-0009 [Google Scholar]
  13. 2015 Hidden complexity–the neglected side of complexity and its implications. Linguistics Vanguard1(1). 177–187. 10.1515/lingvan‑2014‑1014
    https://doi.org/10.1515/lingvan-2014-1014 [Google Scholar]
  14. Blyth, Colin R.
    1972 On Simpson’s paradox and the sure-thing principle. Journal of the American Statistical Association67(338). 364–366. 10.1080/01621459.1972.10482387
    https://doi.org/10.1080/01621459.1972.10482387 [Google Scholar]
  15. Choenni, Rochelle & Ekaterina Shutova
    2020 What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties. arXiv e-prints arXiv:2009.12862. Available at: (last access2 December 2022). CitetononCRdoi:10.48550/arXiv.2009.12862
    https://doi.org/Cite to nonCR doi: 10.48550/arXiv.2009.12862 [Google Scholar]
  16. Christodouloupoulos, Christos & Mark Steedman
    2015 A massively parallel corpus: The Bible in 100 languages. Language Resources and Evaluation49(2). 375–395. 10.1007/s10579‑014‑9287‑y
    https://doi.org/10.1007/s10579-014-9287-y [Google Scholar]
  17. Cohen Priva, Uriel
    2017 Not so fast: Fast speech correlates with lower lexical and structural information. Cognition1601. 27–34. 10.1016/j.cognition.2016.12.002
    https://doi.org/10.1016/j.cognition.2016.12.002 [Google Scholar]
  18. Çöltekin, Çağri & Taraka Rama
    2022 What do complexity measures measure? Correlating and validating corpus-based measures of morphological complexity. Linguistics Vanguard. 10.1515/lingvan‑2021‑0007
    https://doi.org/10.1515/lingvan-2021-0007 [Google Scholar]
  19. Conneau, Alexis, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer & Veselin Stoyanov
    2019 Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116. Available at: (last access2 December 2022). CitetononCRdoi:10.48550/arXiv.1911.02116
    https://doi.org/Cite to nonCR doi: 10.48550/arXiv.1911.02116 [Google Scholar]
  20. Cotterell, Ryan, Christo Kirov, Mans Hulden & Jason Eisner
    2019 On the complexity and typology of inflectional morphological systems. Transactions of the Association for Computational Linguistics71. 327–342. 10.1162/tacl_a_00271
    https://doi.org/10.1162/tacl_a_00271 [Google Scholar]
  21. Coupé, Christophe, Oh Yoon Mi, Dan Dediu & François Pellegrino
    2019 Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche. Science Advances5(9). eaaw2594. 10.1126/sciadv.aaw2594
    https://doi.org/10.1126/sciadv.aaw2594 [Google Scholar]
  22. Covington, Michael A. & Joe D. McFall
    2010 Cutting the Gordian knot: The Moving-Average Type-Token Ratio (MATTR). Journal of Quantitative Linguistics17(2). 94–100. 10.1080/09296171003643098
    https://doi.org/10.1080/09296171003643098 [Google Scholar]
  23. Dahl, Östen
    2004The growth and maintenance of linguistic complexity. Amsterdam: John Benjamins. 10.1075/slcs.71
    https://doi.org/10.1075/slcs.71 [Google Scholar]
  24. de Marneffe, Marie-Catherine, Christopher D. Manning, Joakim Nivre & Daniel Zeman
    2021 Universal Dependencies. Computational Linguistics47(2). 255–308. 10.1162/coli_a_00402
    https://doi.org/10.1162/coli_a_00402 [Google Scholar]
  25. Derbyshire, Desmond C. & Doris L. Payne
    1990 Noun classification systems of Amazonian languages. InDoris L. Payne (ed.), Amazonian linguistics: Studies in lowland South American languages, 243–272. Austin: University of Texas Press.
    [Google Scholar]
  26. Dixon, Robert M. W.
    1988A grammar of Boumaa Fijian. Chicago: University of Chicago Press.
    [Google Scholar]
  27. Dryer, Matthew S. & Martin Haspelmath
    2013The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology.
    [Google Scholar]
  28. Easterday, Shelece, Matthew Stave, Marc Allassonnière-Tang & Frank Seifart
    2021 Syllable complexity and morphological synthesis: a well-motivated positive complexity correlation across subdomains. Frontiers in Psychology121. 583. Available at: 10.3389/fpsyg.2021.638659 (last access5 December 2022).
    https://doi.org/10.3389/fpsyg.2021.638659 [Google Scholar]
  29. Ehret, Katharina & Benedikt Szmrecsanyi
    2016 An information-theoretic approach to assess linguistic complexity. InRaffaela Baechler & Guido Seiler (eds.). Complexity, isolation, and variation, 71–94. Berlin: De Gruyter Mouton. 10.1515/9783110348965‑004
    https://doi.org/10.1515/9783110348965-004 [Google Scholar]
  30. Ehret, Katharina, Alice Blumenthal-Dramé, Christian Bentz & Aleksandrs Berdicevskis
    2021 Meaning and measures: Interpreting and evaluating complexity metrics. Frontiers in Communication61. 640510. Available at: 10.3389/fcomm.2021.640510 (last access5 December 2022).
    https://doi.org/10.3389/fcomm.2021.640510 [Google Scholar]
  31. Erdmann, Alexander, Salam Khalifa, Mai Oudah, Nizar Habash & Houda Bouamor
    2019 A little linguistics goes a long way: Unsupervised segmentation with limited language specific guidance. InGarrett Nicolai & Ryan Cotterell (eds.), Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, 113–124, Florence: Association for Computational Linguistics. 10.18653/v1/W19‑4214
    https://doi.org/10.18653/v1/W19-4214 [Google Scholar]
  32. Frank, Stefan L.
    2013 Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science5(3). 475–494. 10.1111/tops.12025
    https://doi.org/10.1111/tops.12025 [Google Scholar]
  33. Gerz, Daniela, Ivan Vulić, Edoardo Maria Ponti, Roi Reichart & Anna Korhonen
    2018 On the relation between linguistic typology and (limitations of) multilingual language modeling. InEllen Riloff, David Chiang, Julia Hockenmaier & Jun’ichi Tsujii (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 316–327, Brussels: Association for Computational Linguistics. 10.18653/v1/D18‑1029
    https://doi.org/10.18653/v1/D18-1029 [Google Scholar]
  34. Gibson, Edward
    1998 Linguistic complexity: Locality of syntactic dependencies. Cognition68(1). 1–76. 10.1016/S0010‑0277(98)00034‑1
    https://doi.org/10.1016/S0010-0277(98)00034-1 [Google Scholar]
  35. Givón, Talmy
    2009The genesis of syntactic complexity: Diachrony, ontogeny, neuro-cognition, evolution. Amsterdam: John Benjamins. 10.1075/z.146
    https://doi.org/10.1075/z.146 [Google Scholar]
  36. Greenberg, Joseph H.
    1960 A quantitative approach to the morphological typology of language. International Journal of American Linguistics26(3). 178–194. 10.1086/464575
    https://doi.org/10.1086/464575 [Google Scholar]
  37. Gutierrez-Vasques, Ximena & Victor Mijangos
    2020 Productivity and predictability for measuring morphological complexity. Entropy22(1). 48. 10.3390/e22010048
    https://doi.org/10.3390/e22010048 [Google Scholar]
  38. Gutierrez-Vasques, Ximena, Christian Bentz, Olga Sozinova & Tanja Samardzic
    2021 From characters to words: The turning point of BPE merges. InPaola Merlo, Jorg Tiedemann & Reut Tsarfaty (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 3454–3468. Association for Computational Linguistics. 10.18653/v1/2021.eacl‑main.302
    https://doi.org/10.18653/v1/2021.eacl-main.302 [Google Scholar]
  39. Haig, Geoffrey & Stefan Schnell
    (eds.) 2022Multi-CAST: Multilingual corpus of annotated spoken texts. Version 2108. Available at: https://multicast.aspra.uni-bamberg.de (last access2 December 2022).
    [Google Scholar]
  40. Haig, Geoffrey, Stefan Schnell & Frank Seifart
    (eds.) 2021Doing corpus-based typology with spoken language corpora: State of the art. Honolulu: University of Hawai’i Press.
    [Google Scholar]
  41. Hawkins, John A.
    2004Efficiency and complexity in grammars. Oxford: Oxford University Press. 10.1093/acprof:oso/9780199252695.001.0001
    https://doi.org/10.1093/acprof:oso/9780199252695.001.0001 [Google Scholar]
  42. Hollenstein, Nora, Federico Pirovano, Ce Zhang, Lena Jäger & Lisa Beinborn
    2021 Multilingual language models predict human reading behavior. InKristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cottrell, Tanmoy Chakraborty & Yichao Zhou (eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 106–123. Association for Computational Linguistics. 10.18653/v1/2021.naacl‑main.10
    https://doi.org/10.18653/v1/2021.naacl-main.10 [Google Scholar]
  43. Hollenstein, Nora, Emmanuele Chersoni, Cassandra Jacobs, Yohei Oseki, Laurent Prévot & Enrico Santus
    2022 CMCL 2022 Shared Task on Multilingual and Crosslingual Prediction of Human Reading Behavior. InEmmanuele Chersoni, Nora Hollestein, Cassandra Jacobs, Yohei Oseki, Laurent Prévot & Enrico Santus (eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 121–129. Dublin: Association for Computational Linguistics. 10.18653/v1/2022.cmcl‑1.14
    https://doi.org/10.18653/v1/2022.cmcl-1.14 [Google Scholar]
  44. Johnson, Wendell
    1944 Studies in language behavior: I. A program of research. Psychological Monographs56(2). 1–15. 10.1037/h0093508
    https://doi.org/10.1037/h0093508 [Google Scholar]
  45. Joseph, John E. & Frederick J. Newmeyer
    2012 ‘All Languages Are Equally Complex’: The rise and fall of a consensus. Historiographia Linguistica39(2–3). 341–368. 10.1075/hl.39.2‑3.08jos
    https://doi.org/10.1075/hl.39.2-3.08jos [Google Scholar]
  46. Juola, Patrick
    1998 Measuring linguistic complexity: The morphological tier. Journal of Quantitative Linguistics5(3). 206–213. 10.1080/09296179808590128
    https://doi.org/10.1080/09296179808590128 [Google Scholar]
  47. Kettunen, Kimmo, Markus Sadeniemi, Tiina Lindh-Knuutila & Timo Honkela
    2006 Analysis of EU languages through text compression. InTapio Salakoski, Filip Ginter, Sampo Pyysalo & Tapio Pahikkala (eds.), International Conference on Natural Language Processing (in Finland), 99–109. Berlin: Springer. 10.1007/11816508_12
    https://doi.org/10.1007/11816508_12 [Google Scholar]
  48. Kettunen, Kimmo
    2014 Can type-token ratio be used to show morphological complexity of languages?Journal of Quantitative Linguistics21(3). 223–245. 10.1080/09296174.2014.911506
    https://doi.org/10.1080/09296174.2014.911506 [Google Scholar]
  49. Koplenig, Alexander, Peter Meyer, Sascha Wolfer & Carolin Müller-Spitzer
    2017 The statistical trade-off between word order and word structure – Large-scale evidence for the principle of least effort. PLoS ONE12(3). e0173614. 10.1371/journal.pone.0173614
    https://doi.org/10.1371/journal.pone.0173614 [Google Scholar]
  50. Koplenig, Alexander
    2019 Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. Royal Society Open Science6(2). 181274. Available at: 10.1098/rsos.181274 (last access5 December 2022).
    https://doi.org/10.1098/rsos.181274 [Google Scholar]
  51. Kortmann, Bernd & Benedikt Szmrecsanyi
    2012Linguistic complexity: Second language acquisition, indigenization, contact. Berlin: De Gruyter Mouton. 10.1515/9783110229226
    https://doi.org/10.1515/9783110229226 [Google Scholar]
  52. Kusters, Wouter
    2003Linguistic complexity: The influence of social change on verbal inflection. Utrecht: Netherlands Graduate School of Linguistics.
    [Google Scholar]
  53. Lake, Brenden M. & Gregory L. Murphy
    2021 Word meaning in minds and machines. arXiv preprint ArXiv:2008.01766. Available at: (last access2 December 2022). 10.1037/rev0000297
    https://doi.org/10.1037/rev0000297 [Google Scholar]
  54. Lupyan, Gary & Rick Dale
    2010 Language structure is partly determined by social structure. PLoS ONE5(1). e8559. 10.1371/journal.pone.0008559
    https://doi.org/10.1371/journal.pone.0008559 [Google Scholar]
  55. MacWhinney, Brian
    2005 The emergence of linguistic form in time. Connection Sciences17(3–4). 191–211. 10.1080/09540090500177687
    https://doi.org/10.1080/09540090500177687 [Google Scholar]
  56. Maddieson, Ian
    2009 Calculating phonological complexity. InFrançois Pellegrino, Egidio Marsico, Ioana Chitoran & Christophe Coupé (eds.). Approaches to Phonological Complexity, 83–110. Berlin: De Gruyter Mouton. 10.1515/9783110223958.83
    https://doi.org/10.1515/9783110223958.83 [Google Scholar]
  57. Maddieson, Ian, Sébastien Flavier, Egidio Marsico, Christophe Coupé & François Pellegrino
    2013 LAPSyd: Lyon-Albuquerque phonological systems database. InFrédéric Bimbot, Christophe Cerisara, Cécile Fougeron, Lori Lamel, François Pellegrino & Pascal Perrier (eds.), Proceedings of the 14th Interspeech Conference, Lyon, France, 3022–3026Lyon: International Speech Communication Association (ISCA). 10.21437/Interspeech.2013‑660
    https://doi.org/10.21437/Interspeech.2013-660 [Google Scholar]
  58. Malouf, Robert
    2017 Abstractive morphological learning with a recurrent neural network. Morphology271. 431–458. 10.1007/s11525‑017‑9307‑x
    https://doi.org/10.1007/s11525-017-9307-x [Google Scholar]
  59. Mayer, Thomas & Michael Cysouw
    2014 Creating a massively parallel bible corpus. InNicoletta Calzorari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), 3158–3163. Reykjavik: European Language Resources Association (ELRA).
    [Google Scholar]
  60. McCarthy, Arya D. Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ernštreits, Yuval Pinter, Cassandra L. Jacobs, Ryan Cotterell, Mans Hulden & David Yarowsky
    2020 UniMorph 3.0: Universal Morphology. InNicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC 2020), 3922–3931. Marseille: European Language Resources Association (ELRA).
    [Google Scholar]
  61. McCarthy, Philip M. & Scott Jarvis
    2010 MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods421. 381–392. 10.3758/BRM.42.2.381
    https://doi.org/10.3758/BRM.42.2.381 [Google Scholar]
  62. Meister, Clara, Tiago Pimentel, Patrick Haller, Lena Jäger, Ryan Cotterell & Roger Levy
    2021 Revisiting the Uniform Information Density hypothesis. InMarie-Francine Moens, Xuanjing Huang, Lucia Specia & Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 963–980. Punta Cana: Association for Computational Linguistics. 10.18653/v1/2021.emnlp‑main.74
    https://doi.org/10.18653/v1/2021.emnlp-main.74 [Google Scholar]
  63. Merkx, Danny & Stefan L. Frank
    2021 Human sentence processing: Recurrence or attention?InEmmanuele Chersoni, Nora Hollenstein, Cassandra Jacobs, Yohei Oseki, Laurent Prévot & Enrico Santus (eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 12–22. Association for Computational Linguistics. 10.18653/v1/2021.cmcl‑1.2
    https://doi.org/10.18653/v1/2021.cmcl-1.2 [Google Scholar]
  64. Mielke, Sabrina J., Ryan Cotterell, Kyle Gorman, Brian Roark & Jason Eisner
    2019 What kind of language is hard to language-model?InAnna Korhonen, David Traum & Lluís Màrquez (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4975–4989. Florence: Association for Computational Linguistics. 10.18653/v1/P19‑1491
    https://doi.org/10.18653/v1/P19-1491 [Google Scholar]
  65. Miestamo, Matti, Kaius Sinnemäki & Fred Karlsson
    (eds.) 2008Language complexity: Typology, contact, change (Studies in Language Companion Series 94). Amsterdam: John Benjamins. 10.1075/slcs.94
    https://doi.org/10.1075/slcs.94 [Google Scholar]
  66. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeffrey Dean
    2013 Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems261. 3111–3119.
    [Google Scholar]
  67. Moscoso del Prado, Fermin
    2011 The mirage of morphological complexity. InLaura Carlson, Christoph Hoelscher & Thomas F. Shipley (eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 3524–3529. Austin: Cognitive Science Society.
    [Google Scholar]
  68. Mufwene, Salikoko S., Christophe Coupé & François Pellegrino
    2017Complexity in language: Developmental and evolutionary perspectives. Cambridge: Cambridge University Press. 10.1017/9781107294264
    https://doi.org/10.1017/9781107294264 [Google Scholar]
  69. Nercesian, Verónica
    2014 Wordhood and the interplay of linguistic levels in synthetic languages. An empirical study on Wichi (Mataguayan, Gran Chaco). Morphology241. 177–198. 10.1007/s11525‑014‑9239‑7
    https://doi.org/10.1007/s11525-014-9239-7 [Google Scholar]
  70. Newman, Paul
    2003 Hausa and the Chadic languages. InBernard Comrie (ed.) The major languages of South Asia, the Middle East and Africa, 177–192. London: Routledge.
    [Google Scholar]
  71. Nichols, Johanna & Christian Bentz
    2019 Morphological complexity of languages reflects the settlement history of the Americas. InKaterina Harvati, Gerhard Jäger & Hugo Reyes-Centeno (eds.). New perspectives on the peopling of the Americas, 13–26. Tübingen: Kerns Verlag.
    [Google Scholar]
  72. Oh, Yoon Mi
    2015 Linguistic complexity and information: Quantitative approaches. Lyon: University of Lyon Ph.D. dissertation.
  73. Oh, Yoon Mi, Christophe Coupé, Egidio Marsico & François Pellegrino
    2015 Bridging phonological system and lexicon: Insights from a corpus study of functional load. Journal of Phonetics531. 153–176. 10.1016/j.wocn.2015.08.003
    https://doi.org/10.1016/j.wocn.2015.08.003 [Google Scholar]
  74. Paschen, Ludger, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave & Frank Seifart
    2020 Building a time-aligned cross-linguistic reference corpus from language documentation data (DoReCo). InNicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2657–2666. Marseille: European Language Resources Association.
    [Google Scholar]
  75. Pellegrino, François, Christophe Coupé & Egidio Marsico
    2011 A cross-language perspective on speech information rate. Language87(3). 539–558. 10.1353/lan.2011.0057
    https://doi.org/10.1353/lan.2011.0057 [Google Scholar]
  76. Pimentel, Tiago, Brian Roark & Ryan Cotterell
    2020 Phonotactic complexity and its trade-offs. Transactions of the Association for Computational Linguistics81. 1–18. 10.1162/tacl_a_00296
    https://doi.org/10.1162/tacl_a_00296 [Google Scholar]
  77. Pimentel, Tiago, Clara Meister, Elizabeth Salesky, Simone Teufel, Damián Blasi & Ryan Cotterell
    2021 A surprisal–duration trade-off across and within the world’s languages. InMarie-Francine Moens, Xuanjing Huang, Lucia Specia & Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 949–962. Punta Cana: Association for Computational Linguistics. 10.18653/v1/2021.emnlp‑main.73
    https://doi.org/10.18653/v1/2021.emnlp-main.73 [Google Scholar]
  78. Ponti, Edoardo Maria, Helen O’Horan, Yevgeni Berzak, Ivan Vulić, Roi Reichart, Thierry Poibeau, Ekaterina Shutova & Anna Korhonen
    2019 Modeling language variation and universals: A survey on typological linguistics for natural language processing. Computational Linguistics45(3). 559–601. 10.1162/coli_a_00357
    https://doi.org/10.1162/coli_a_00357 [Google Scholar]
  79. Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei & Ilya Sutskever
    2019 Language models are unsupervised multitask learners. OpenAI blog1(8). 9.
    [Google Scholar]
  80. Rust, Phillip, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder & Iryna Gurevych
    2020 How good is your tokenizer? On the monolingual performance of multilingual language models. arXiv preprint arXiv:2012.15613. Available at: (last access2 December 2022). CitetononCRdoi:10.48550/arXiv.2012.15613
    https://doi.org/Cite to nonCR doi: 10.48550/arXiv.2012.15613 [Google Scholar]
  81. Schrimpf, Martin, Idan Asher Blank, Greta Tuckute, Carina Kauf, Eghbal A. Hosseini, Nancy Kanwisher, Joshua B. Tenenbaum & Evelina Fedorenko
    2021 The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing. Proceedings of the National Academy of Sciences118(45). e2105646118. 10.1073/pnas.2105646118
    https://doi.org/10.1073/pnas.2105646118 [Google Scholar]
  82. Shosted, Ryan K.
    2006 Correlating complexity: A typological approach. Linguistic Typology10(1). 1–40. 10.1515/LINGTY.2006.001
    https://doi.org/10.1515/LINGTY.2006.001 [Google Scholar]
  83. Sinnemäki, Kaius & Di Garbo, Francesca
    2018 Language structures may adapt to the sociolinguistic environment, but it matters what and how you count: A typological study of verbal and nominal complexity. Frontiers in Psychology91. 1141. 10.3389/fpsyg.2018.01141
    https://doi.org/10.3389/fpsyg.2018.01141 [Google Scholar]
  84. Thomason, Sarah Grey & Terrence Kaufman
    1992Language contact, creolization, and genetic linguistics. Berkeley: University of California Press.
    [Google Scholar]
  85. Thornell, Christina
    1997The Sango language and its lexicon (Sêndâ-yângâ tî sängö). Vol.321. Lund: Lund University.
    [Google Scholar]
  86. Trudgill, Peter
    2001 Contact and simplification: Historical baggage and directionality in linguistic change. Linguistic Typology5(2). 371–374.
    [Google Scholar]
  87. 2011Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press.
    [Google Scholar]
  88. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser & Illia Polosukhin
    2017 Attention is all you need. Advances in neural information processing systems301. 6000–6010.
    [Google Scholar]
  89. Vera, Javier & Wenceslao Palma
    2020 Laplacian spectrum approach to linguistic complexity: A case study on indigenous languages of the Americas. Europhysics Letters129(5). 58003. 10.1209/0295‑5075/129/58003
    https://doi.org/10.1209/0295-5075/129/58003 [Google Scholar]
  90. von Prince, Kilu & Vera Demberg
    2018 POS tag perplexity as a measure of syntactic complexity. InAlekandrs Berdicevskis & Christian Bentz (eds.), Proceedings of the First Shared Task on Measuring Language Complexity, 20–25. Uppsala: Uppsala University, Department of Linguistics and Philology.
    [Google Scholar]
  91. Wedel, Andrew, Abby Kaplan & Scott Jackson
    2013 High functional load inhibits phonological contrast loss: A corpus study. Cognition128(2). 179–186. 10.1016/j.cognition.2013.03.002
    https://doi.org/10.1016/j.cognition.2013.03.002 [Google Scholar]
  92. Wilcox, Ethan Gotlieb, Jon Gauthier, Jennifer Hu, Peng Qian & Roger Levy
    2020 On the predictive power of neural language models for human real-time comprehension behavior. InStephanie Denison, Michael Mack, Yang Xu & Blair C. Armstrong (eds.), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society, 1707–1713. Cognitive Science Society.
    [Google Scholar]
  93. Wolf, Thomas, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest & Alexander Rush
    2020 Transformers: State-of-the-art natural language processing. InQun Liu & David Schlangen (eds.) Proceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations, 38–45. Association for Computational Linguistics. 10.18653/v1/2020.emnlp‑demos.6
    https://doi.org/10.18653/v1/2020.emnlp-demos.6 [Google Scholar]
  94. Wray, Alison & George W. Grace
    2007 The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influences on linguistic form. Lingua117(3). 543–578. 10.1016/j.lingua.2005.05.005
    https://doi.org/10.1016/j.lingua.2005.05.005 [Google Scholar]
  95. Wu, Shang-Yu, Rei-Jane Huang & I-Fang Tsai
    2019 The applicability of D, MTLD, and MATTR in Mandarin–speaking children. Journal of Communication Disorders771. 71–79. 10.1016/j.jcomdis.2018.10.002
    https://doi.org/10.1016/j.jcomdis.2018.10.002 [Google Scholar]
/content/journals/10.1075/sl.22034.oh
Loading
/content/journals/10.1075/sl.22034.oh
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error