1887
image of Measuring the similarity between languages
USD
Buy:$35.00 + Taxes

Abstract

Abstract

In typology, statistical methods have been successfully used to assess similarities and differences between languages. In creole studies, the use of quantitative methods has been discussed controversially. In the debate many methodological aspects of the statistical models used have been criticized (e.g. Meakins 2022; Bakker 2023). This paper presents an investigation of two methodological problems that have not been critically looked at so far: the question of which statistical models produce which results, and the question of how the amount of missing values in data sets influences the results. We present a study in which we tested different statistical models on 21 features from two arbitrarily chosen domains (‘Word Order’ and ‘Nominal Categories’) from the WALS (Dryer & Haspelmath, 2013) and APiCS (Michaelis et al., 2013) data bases. It is demonstrated that different statistical methods yield similar results, and that different sample sizes do not dramatically influence the model outcomes.

Loading

Article metrics loading...

/content/journals/10.1075/jpcl.24026.pla
2025-05-26
2025-06-24
Loading full text...

Full text loading...

References

  1. Baayen, Harald
    2008Analyzing linguistic data. A practical introduction to statistics. Cambridge: Cambridge University Press. 10.1017/CBO9780511801686
    https://doi.org/10.1017/CBO9780511801686 [Google Scholar]
  2. Baker, Philip
    1990 Off target?Journal of Pidgin and Creole Languages(). –. 10.1075/jpcl.5.1.07bak
    https://doi.org/10.1075/jpcl.5.1.07bak [Google Scholar]
  3. Bakker, Peter
    2023 Empiricism against imperialism: Science, dogma and the neocolonial heritage of creole studies. Reflections on Meakins (2022). Journal of Pidgin and Creole Languages. 10.1075/jpcl.00119.bak
    https://doi.org/10.1075/jpcl.00119.bak [Google Scholar]
  4. Bakker, Peter, Finn Borchsenius, Carsten Levisen & Eeva M. Sippola
    2017Creole studies: Phylogenetic approaches. John Benjamins Publishing Company. 10.1075/z.211
    https://doi.org/10.1075/z.211 [Google Scholar]
  5. Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall & Ingo Plag
    2011 Creoles are typologically distinct from non-creoles. Journal of Pidgin and Creole Languages(). –. 10.1075/jpcl.26.1.02bak
    https://doi.org/10.1075/jpcl.26.1.02bak [Google Scholar]
  6. Bickel, Balthasar
    2007 Typology in the 21st century: Major current developments. Linguistic Typology(). –. 10.1515/LINGTY.2007.018
    https://doi.org/10.1515/LINGTY.2007.018 [Google Scholar]
  7. Blasi, Damián E., Susanne Maria Michaelis & Martin Haspelmath
    2017 Grammars are robustly transmitted even during the emergence of creole languages. Nature Human Behaviour(). –. 10.1038/s41562‑017‑0192‑4
    https://doi.org/10.1038/s41562-017-0192-4 [Google Scholar]
  8. Bouckaert, Remco, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard & Quentin D. Atkinson
    2012 Mapping the origins and expansion of the Indo-European language family. Science(). –. 10.1126/science.1219669
    https://doi.org/10.1126/science.1219669 [Google Scholar]
  9. Cysouw, Michael
    2008 Using the World Atlas of Language Structures. Language Typology and Universals(). –. 10.1524/stuf.2008.0018
    https://doi.org/10.1524/stuf.2008.0018 [Google Scholar]
  10. Daval-Markussen, Aymeric
    2019 Reconstructing creole. Aarhus: Aarhus University Phd dissertation.
  11. DeGraff, Michel
    2003 Against creole exceptionalism. Language(). –. 10.1353/lan.2003.0114
    https://doi.org/10.1353/lan.2003.0114 [Google Scholar]
  12. Dryer, Matthew S. & Martin Haspelmath
    2013 WALS Online (v2020.3). Zenodo10.5281/zenodo.7385533
    https://doi.org/10.5281/zenodo.7385533 [Google Scholar]
  13. Dunn, Michael, Angela Terrill, Ger Reesink, Robert A. Foley & Stephen C. Levinson
    2005 Structural phylogenetics and the reconstruction of ancient language history. Science(). –. 10.1126/science.1114615
    https://doi.org/10.1126/science.1114615 [Google Scholar]
  14. Efron, Bradley
    1983 Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association(). –. 10.1080/01621459.1983.10477973
    https://doi.org/10.1080/01621459.1983.10477973 [Google Scholar]
  15. Fantini, Damiano
    2018 colorhcplot: Colorful hierarchical clustering dendrograms. URL10.32614/CRAN.package.colorhcplot
    https://doi.org/10.32614/CRAN.package.colorhcplot [Google Scholar]
  16. Friedman, Susan Stanford
    2013 Why not compare?InRita Felski & Susan Stanford Friedman (eds.), Comparison: Theories, approaches, uses, –. Johns Hopkins University Press.
    [Google Scholar]
  17. Gorman, Ben
    2016 mltools: Machine Learning Tools. 10.32614/CRAN.package.mltools. Institution: Comprehensive R Archive Network Pages: 0.3.5. URLhttps://CRAN.R-project.org/package=mltools
    https://doi.org/10.32614/CRAN.package.mltools
  18. Guzmán Naranjo, Matías & Laura Becker
    2022 Statistical bias control in typology. Linguistic Typology(). –. 10.1515/lingty‑2021‑0002
    https://doi.org/10.1515/lingty-2021-0002 [Google Scholar]
  19. Haspelmath, Martin
    2010 Comparative concepts and descriptive categories in crosslinguistic studies. Language(). –. 10.1353/lan.2010.0021
    https://doi.org/10.1353/lan.2010.0021 [Google Scholar]
  20. Holm, John & Peter L. Patrick
    2007Comparative creole syntax. Battlebridge.
    [Google Scholar]
  21. Jaeger, T. Florian, Peter Graff, William Croft & Daniel Pontillo
    2011 Mixed effect models for genetic and areal dependencies in linguistic typology. Linguistic Typology(). –. 10.1515/lity.2011.021
    https://doi.org/10.1515/lity.2011.021 [Google Scholar]
  22. Kocka, Jürgen
    2003 Comparison and beyond. History and Theory(). –. 10.1111/1468‑2303.00228
    https://doi.org/10.1111/1468-2303.00228 [Google Scholar]
  23. Kuhn & Max
    Kuhn & Max 2008 Building predictive models in r using the caret package. Journal of Statistical Software(). –. 10.18637/jss.v028.i05. URLhttps://www.jstatsoft.org/index.php/jss/article/view/v028i05
    https://doi.org/10.18637/jss.v028.i05 [Google Scholar]
  24. Lander, Yury & Peter Arkadiev
    2016 On the right of being a comparative concept. Linguistic Typology(). –. 10.1515/lingty‑2016‑0014
    https://doi.org/10.1515/lingty-2016-0014 [Google Scholar]
  25. Lefebvre, Claire
    1998Creole genesis and the acquisition of grammar: The case of haitian creole. Cambridge: Cambridge University Press.
    [Google Scholar]
  26. Levy, Dan & Lior Pachter
    2011 The neighbor-net algorithm. Advances in Applied Mathematics(). –. 10.1016/j.aam.2010.09.002
    https://doi.org/10.1016/j.aam.2010.09.002 [Google Scholar]
  27. Lindstromberg, Seth
    2022 P-curving as a safeguard against p-hacking in SLA research: A case study. Studies in Second Language Acquisition(). –. 10.1017/S0272263121000516
    https://doi.org/10.1017/S0272263121000516 [Google Scholar]
  28. List, Johann-Mattis
    2021 Computer-assisted approaches to historical language comparison. Jena: Friedrich-Schiller-Universität Jena, Philosophische Fakultät Habilitation Thesis. 10.22032/dbt.49007
    https://doi.org/10.22032/dbt.49007 [Google Scholar]
  29. Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert & Kurt Hornik
    2025 cluster: Cluster analysis basics and extensions. R package version 2.1.8.1. URLhttps://CRAN.R-project.org/package=cluster
  30. McWhorter, John H.
    1998 Identifying the creole prototype: Vindicating a typological class. Language(). –. 10.2307/417003
    https://doi.org/10.2307/417003 [Google Scholar]
  31. Meakins, Felicity
    2022 Empiricism or imperialism: The science of creole exceptionalism. Journal of Pidgin and Creole Languages(). –. 10.1075/jpcl.00092.mea
    https://doi.org/10.1075/jpcl.00092.mea [Google Scholar]
  32. Michaelis, Susanne Maria, Philippe Maurer, Martin Haspelmath & Magnus Huber
    (eds.) 2013The atlas of pidgin and creole language structures. Oxford University Press, USA.
    [Google Scholar]
  33. Murawaki, Yugo
    2016 Statistical modeling of creole genesis. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, –. 10.18653/v1/N16‑1158
    https://doi.org/10.18653/v1/N16-1158 [Google Scholar]
  34. Muysken, Pieter
    1988 Are creoles a special type of language?InFrederick J. Editor Newmeyer (ed.), Linguistics: The cambridge survey, –. Cambridge University Press. 10.1017/CBO9780511621055.017
    https://doi.org/10.1017/CBO9780511621055.017 [Google Scholar]
  35. Paradis, Emmanuel & Klaus Schliep
    2019 ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics(). –. 10.1093/bioinformatics/bty633
    https://doi.org/10.1093/bioinformatics/bty633 [Google Scholar]
  36. Parkvall, Mikael
    2008 The simplicity of creoles in a cross-linguistic perspective. InMatti Miestamo, Fred Karlsson & Kaius Sinnemäki (eds.), Language complexity. typology, contact, change, –. John Benjamins Publishing Company. 10.1075/slcs.94.17par
    https://doi.org/10.1075/slcs.94.17par [Google Scholar]
  37. Plag, Ingo
    2008 Creoles as interlanguages: Inflectional morphology. Journal of Pidgin and Creole Languages(). –. 10.1075/jpcl.23.1.06pla
    https://doi.org/10.1075/jpcl.23.1.06pla [Google Scholar]
  38. 2011 Creolization and admixture: Typology, feature pools, and second language acquisition. Journal of Pidgin and Creole Languages(). –. 10.1075/jpcl.26.1.04pla
    https://doi.org/10.1075/jpcl.26.1.04pla [Google Scholar]
  39. R Core Team
    R Core Team 2021 R: A language and environment for statistical computing. URLhttps://www.R-project.org/
  40. Radhakrishnan, R.
    2013 Why compare?InRita Felski & Susan Stanford Friedman (eds.), Comparison: Theories, approaches, uses, –. Johns Hopkins University Press.
    [Google Scholar]
  41. Roettger, Timo B.
    2019 Researcher degrees of freedom in phonetic research. Laboratory Phonology(). 10.5334/labphon.147
    https://doi.org/10.5334/labphon.147 [Google Scholar]
  42. Schliep, Klaus, Alastair J. Potts, David A. Morrison & Guido W. Grimm
    2017 Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution. –. 10.1111/2041‑210X.12760
    https://doi.org/10.1111/2041-210X.12760 [Google Scholar]
  43. Schliep, Klaus Peter
    2011 phangorn: Phylogenetic analysis in R. Bioinformatics(). –. 10.1093/bioinformatics/btq706
    https://doi.org/10.1093/bioinformatics/btq706 [Google Scholar]
  44. Skirgård, Hedvig, Hannah J. Haynie, Damián E. Blasi, Harald Hammarström, Jeremy Collins, Jay J. Latarche, Jakob Lesage, Tobias Weber, Alena Witzlack-Makarevich, Sam Passmore, Angela Chira, Luke Maurits, Russell Dinnage, Michael Dunn, Ger Reesink, Ruth Singer, Claire Bowern, Patience Epps, Jane Hill, Outi Vesakoski, Martine Robbeets, Noor Karolin Abbas, Daniel Auer, Nancy A. Bakker, Giulia Barbos, Robert D. Borges, Swintha Danielsen, Luise Dorenbusch, Ella Dorn, John Elliott, Giada Falcone, Jana Fischer, Yustinus Ghanggo Ate, Hannah Gibson, Hans-Philipp Göbel, Jemima A. Goodall, Victoria Gruner, Andrew Harvey, Rebekah Hayes, Leonard Heer, Roberto E. Herrera Miranda, Nataliia Hübler, Biu Huntington-Rainey, Jessica K. Ivani, Marilen Johns, Erika Just, Eri Kashima, Carolina Kipf, Janina V. Klingenberg, Nikita König, Aikaterina Koti, Richard G. A. Kowalik, Olga Krasnoukhova, Nora L. M. Lindvall, Mandy Lorenzen, Hannah Lutzenberger, Tônia R. A. Martins, Celia Mata German, Suzanne van der Meer, Jaime Montoya Samamé, Michael Müller, Saliha Muradoğlu, Kelsey Neely, Johanna Nickel, Miina Norvik, Cheryl Akinyi Oluoch, Jesse Peacock, India O. C. Pearey, Naomi Peck, Stephanie Petit, Sören Pieper, Mariana Poblete, Daniel Prestipino, Linda Raabe, Amna Raja, Janis Reimringer, Sydney C. Rey, Julia Rizaew, Eloisa Ruppert, Kim K. Salmon, Jill Sammet, Rhiannon Schembri, Lars Schlabbach, Frederick W. P. Schmidt, Amalia Skilton, Wikaliler Daniel Smith, Hilário de Sousa, Kristin Sverredal, Daniel Valle, Javier Vera, Judith Voß, Tim Witte, Henry Wu, Stephanie Yam, Jingting Ye, Maisie Yong, Tessa Yuditha, Roberto Zariquiey, Robert Forkel, Nicholas Evans, Stephen C. Levinson, Martin Haspelmath, Simon J. Greenhill, Quentin D. Atkinson & Russell D. Gray
    2023 Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss. Science Advances(). 10.1126/sciadv.adg6175
    https://doi.org/10.1126/sciadv.adg6175 [Google Scholar]
  45. Stephen Milborrow
    Stephen Milborrow 2011 rpart.plot. URLhttps://CRAN.R-project.org/package=rpart.plot
  46. Torsten Hothorn & Achim Zeileis
    2009 partykit: A toolkit for recursive partytioning. URLR-Forge.R-project.org/projects/partykit/
  47. Torsten Hothorn, Kurt Hornik, Carolin Strobl & Achim Zeileis
    2009 party: A laboratory for recursive partytioning. URLCRAN.R-project.org/package=party
  48. Winter, Bodo
    2019Statistics for linguists: An introduction using r. Routledge. 10.4324/9781315165547
    https://doi.org/10.4324/9781315165547 [Google Scholar]
  49. Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan & Tommy Tsan-Yuk Lam
    2017 GGTREE: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. –. 10.1111/2041‑210X.12628
    https://doi.org/10.1111/2041-210X.12628 [Google Scholar]
/content/journals/10.1075/jpcl.24026.pla
Loading
/content/journals/10.1075/jpcl.24026.pla
Loading

Data & Media loading...

  • Article Type: Research Article
Keywords: typology ; statistical modeling ; phylogenetic network ; non-creole ; APiCS, WALS ; similiarity ; creole
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error