1887
Volume 2, Issue 2
  • ISSN 2950-189X
  • E-ISSN: 2950-1881

Abstract

Abstract

Since the 1980s, computational methods have been introduced to dialectology (known as dialectometry, cf. Goebl 1984, Heeringa 2004). Many of these methods were designed for data from dialect surveys or linguistic atlases, typically elicited items uttered in isolation. Scholars have turned to corpus-based approaches to seek dialect patterns from more naturalistic speech, which can tell us more about the context and magnitude of the variants used (Kuparinen and Scherrer 2024).

Transcriptions of spontaneous speech pose challenges for traditional approaches to automatic dialect classification: it is impossible to go through all the transcriptions manually; these are not systematic word lists; and we should not only extract the frequency of some known features, as we might overlook features that are not yet discovered.

This paper employs topic modelling to automatically detect dialect groups in the southern Dutch dialects. This method is data-driven and can overcome the issues mentioned above. The result shows that southern Dutch dialects can be divided into 2 to 4 major groups, coinciding with the traditional classification (Taeldeman 2001).

Available under the CC BY 4.0 license.
Loading

Article metrics loading...

/content/journals/10.1075/nb.00043.sun
2025-10-31
2025-12-04
Loading full text...

Full text loading...

/deliver/fulltext/nb.00043.sun.html?itemId=/content/journals/10.1075/nb.00043.sun&mimeType=html&fmt=ahah

References

  1. Anderwald, Lieselotte, & Benedikt Szmrecsanyi
    2009 Corpus linguistics and dialectology. InAnke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook, vol. 2, 1126–1139. 10.1515/9783110213881.2.1126
    https://doi.org/10.1515/9783110213881.2.1126 [Google Scholar]
  2. Barbiers, Sjef, Hans Bennis, Gunther De Vogelaer, Magda Devos & Margreet van der Ham
    2005Syntactische atlas van de Nederlandse dialecten, vol. 1. Amsterdam: Amsterdam University Press.
    [Google Scholar]
  3. Barbiers, Sjef, Johan van der Auwera, Hans Bennis, Eefje Boef, Gunther De Vogelaer & Margreet van der Ham
    2008Syntactische atlas van de Nederlandse dialecten, vol. 2. Amsterdam: Amsterdam University Press. 10.5117/9789053567791
    https://doi.org/10.5117/9789053567791 [Google Scholar]
  4. Blei, David M.
    2012 Probabilistic topic models. Communications of the ACM55(4), 77–84. 10.1145/2133806.2133826
    https://doi.org/10.1145/2133806.2133826 [Google Scholar]
  5. Blei, David M., Andrew Y. Ng & Michael I. Jordan
    2003 Latent dirichlet allocation. Journal of Machine Learning Research31. 993–1022.
    [Google Scholar]
  6. Borg, Ingwer & Patrick J. F. Groenen
    2005Modern multidimensional scaling: theory and applications. New York: Springer New York.
    [Google Scholar]
  7. Breitbarth, Anne, Melissa Farasyn, Anne-Sophie Ghyselen & Jacques Van Keymeulen
    2018 Het Gesproken Corpus van de zuidelijk-Nederlandse Dialecten. Handelingen Koninklijke Zuid-Nederlandse Maatschappij voor Taal- en Letterkunde en Geschiedenis721. 10.21825/kzm.v72i0.17914
    https://doi.org/10.21825/kzm.v72i0.17914 [Google Scholar]
  8. Breitbarth, Anne, Melissa Farasyn, Anne-Sophie Ghyselen, Lien Hellebaut, Frederic Lamsens, Katrien Depuydt, Jesse de Does, Jan Niestadt & Koen Mertens
    2024Gesproken Corpus van de zuidelijk-Nederlandse Dialecten. 1st releaseOctober 2024. Available at theDutch Language Institute: https://hdl.handle.net/10032/tm-a2-z9
    [Google Scholar]
  9. Chambers, Jack K. & Peter Trudgill
    1998Dialectology (2nd edition). Cambridge: Cambridge University Press. 10.1017/CBO9780511805103
    https://doi.org/10.1017/CBO9780511805103 [Google Scholar]
  10. Ghyselen, Anne-Sophie, Jacques Van Keymeulen, Melissa Farasyn, Lien Hellebaut & Anne Breitbarth
    2020 Het transcriptieprotocol van het Gesproken Corpus van de Nederlandse Dialecten (GCND). Bulletin de la commission royal de toponymie & dialectology921. 83–115. 10.21825/hctd.88842
    https://doi.org/10.21825/hctd.88842 [Google Scholar]
  11. Goebl, Hans
    1984Dialektometrische studien: Anhand italoromanischer, raetoromanischer und galloromanischer sprachmaterialien aus AIS und ALF. (Beihefte zur Zeitschrift für romanische Philologie 191–193). Niemeyer, Tübingen.
    [Google Scholar]
  12. 2018 Dialectometry. InCharles Boberg, John Nerbonne & Dominic Watt (eds.), The handbook of dialectology, 123–142. New Jersey: Wiley-Blackwell.
    [Google Scholar]
  13. Grootaers, Ludovic & Gesinus Kloeke
    1926Handleiding bij het Noord- en ZuidNnederlandsch dialectonderzoek: Met een kaart. ’s-Gravenhage: Martinus Nijhoff. 10.1007/978‑94‑011‑9148‑7
    https://doi.org/10.1007/978-94-011-9148-7 [Google Scholar]
  14. Grootendorst, Maarten
    2022 BERTopic: neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.
    [Google Scholar]
  15. Heeringa, Wilbert
    2004 Measuring dialect pronunciation using Levenshtein distance. (PhD thesis, University of Groningen).
    [Google Scholar]
  16. Kuparinen, Olli & Yves Scherrer
    2024 Corpus-based dialectometry with topic models. Journal of Linguistic Geography12(1). 1–12. 10.1017/jlg.2024.6
    https://doi.org/10.1017/jlg.2024.6 [Google Scholar]
  17. Lameli, Alfred & Schönberg, Andreas
    2023 A measure for linguistic coherence in spatial language variation. InTenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), 133–141. 10.18653/v1/2023.vardial‑1.13
    https://doi.org/10.18653/v1/2023.vardial-1.13 [Google Scholar]
  18. Levenshtein, Vladimir I.
    1966 Binary codes capable of correcting deletions, insertions, and reversals. InSoviet Physics Doklady10(8). 707–710.
    [Google Scholar]
  19. Manning, Christopher & Hinrich Schütze
    1999Foundations of statistical natural language processing. Cambridge: MIT Press.
    [Google Scholar]
  20. Nerbonne, John
    2011 Mapping aggregate variation. InAlfred Lameli, Roland Kehrein & Stefan Rabanus (eds.), An international handbook of linguistic variation volume 2 Language mapping, 476–501. Berlin, New York: De Gruyter Mouton. 10.1515/9783110219166.1.476
    https://doi.org/10.1515/9783110219166.1.476 [Google Scholar]
  21. Nerbonne, John & Peter Kleiweg
    2007 Toward a dialectological yardstick. Journal of Quantitative Linguistics14(2–3). 148–166. 10.1080/09296170701379260
    https://doi.org/10.1080/09296170701379260 [Google Scholar]
  22. Orton, Harold
    1962Survey of the English dialects: Introduction. Leeds: E. J. Arnold & Son.
    [Google Scholar]
  23. Paatero, Pentti & Unto Tapper
    1994 Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics5(2). 111–126. 10.1002/env.3170050203
    https://doi.org/10.1002/env.3170050203 [Google Scholar]
  24. QGIS Development Team
    QGIS Development Team 2025QGIS geographic information system. Open Source Geospatial Foundation Project.
    [Google Scholar]
  25. Ryckeboer, Hugo
    2013 A west Flemish dialect as a minority language in the north of France. InFrans Hinskens & Johan Taeldeman (eds.), An international handbook of linguistic variation volume 3 Dutch, 782–800. Berlin, Boston: De Gruyter Mouton. 10.1515/9783110261332.782
    https://doi.org/10.1515/9783110261332.782 [Google Scholar]
  26. Siewert, Janine, Yves Scherrer & Martijn Wieling
    2022 Low Saxon dialect distances at the orthographic and syntactic level. InNina Tahmasebi, Syrielle Montariol, Andrey Kutuzov, Simon Hengchen, Haim Dubossarsky & Lars Borin (eds.), 3rd International Workshop on Computational Approaches to Historical Language Change (LChange) 2022, 119–124. 10.18653/v1/2022.lchange‑1.12
    https://doi.org/10.18653/v1/2022.lchange-1.12 [Google Scholar]
  27. Spruit, Marco R., Wilbert Heeringa & John Nerbonne
    2009 Associations among linguistic levels. Lingua119(11). 1624–1642. 10.1016/j.lingua.2009.02.001
    https://doi.org/10.1016/j.lingua.2009.02.001 [Google Scholar]
  28. Sung, Ho Wang Matthew & Jelena Prokić
    2024 Identification of dialect typicality and kernels. 12th International Conference on Language Variation in Europe (ICLaVE|12). (Oral Presentation)
    [Google Scholar]
  29. Szmrecsanyi, Benedikt
    2013Grammatical variation in British English dialects: A study in corpus-based dialectometry. Cambridge University Press.
    [Google Scholar]
  30. Szmrecsanyi, Benedikt. & Lieselotte Anderwald
    2018 Corpus-based approaches to dialect study. InCharles Boberg, John Nerbonne & Dominic Watt (eds.), The handbook of dialectology, 300–313. New Jersey: Wiley-Blackwell.
    [Google Scholar]
  31. Taeldeman, Johan
    2001 De regenboog van de Vlaamse dialecten. InJohan Taeldeman, Magda Devos & Johan De Caluwe (eds.), Het taallandschap in Vlaanderen, 49–58. Ghent: Academia Press.
    [Google Scholar]
  32. Taeldeman, Johan & Hermann Niebaum
    2013 History and development of Dutch dialect research. InFrans Hinskens & Johan Taeldeman (eds.), An international handbook of linguistic variation, vol. 3: Dutch, 13–35. Berlin, Boston: De Gruyter Mouton. 10.1515/9783110261332.13
    https://doi.org/10.1515/9783110261332.13 [Google Scholar]
  33. Trudgill, Peter
    1999The dialects of England (2nd ed.). Oxford: Blackwell.
    [Google Scholar]
  34. Van Keymeulen, Jacques, Anne Breitbarth, Anne-Sophie Ghyselen & Melissa Farasyn
    2020 Transcriptieproject ‘Stemmen uit het verleden’. Transcriptieprotocol. Available at: https://www.gcnd.ugent.be/wp-content/uploads/2024/03/2024_03_29_Transcriptieprotocol.pdf
  35. Vanacker, Valeer F. & Georges De Schutter
    1967 Zuidnederlandse dialekten op de band. Taal en Tongval191. 35–51.
    [Google Scholar]
  36. Virpioja, Sami, Peter Smit, Stig-Arne Grönroos & Mikko Kurimo
    2013Morfessor 2.0: Python implementation and extensions for Morfessor baseline. (https://aaltodoc.aalto.fi/handle/123456789/11836)
    [Google Scholar]
  37. Weijnen, Antonius A.
    1966 (= 1958)Nederlandse dialectkunde. Van Gorcum.
    [Google Scholar]
  38. Wiesinger, Peter
    1983Die einteilung der deutschen dialekte. InWerner Besh, Ulrich Knoop, Wolfgang Putschke & Herbert E. Wiegand (eds.) Dialektologie. 2. halbband, 807–899. De Gruyter. 10.1515/9783110203332.807
    https://doi.org/10.1515/9783110203332.807 [Google Scholar]
/content/journals/10.1075/nb.00043.sun
Loading
/content/journals/10.1075/nb.00043.sun
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error