-
oa Automatic dialect classification of the Southern Dutch dialects
- Source: Nota Bene, Volume 2, Issue 2, Oct 2025, p. 448 - 473
-
- 09 Apr 2025
- 24 Jun 2025
- 31 Oct 2025
Abstract
Abstract
Since the 1980s, computational methods have been introduced to dialectology (known as dialectometry, cf. Goebl 1984, Heeringa 2004). Many of these methods were designed for data from dialect surveys or linguistic atlases, typically elicited items uttered in isolation. Scholars have turned to corpus-based approaches to seek dialect patterns from more naturalistic speech, which can tell us more about the context and magnitude of the variants used (Kuparinen and Scherrer 2024).
Transcriptions of spontaneous speech pose challenges for traditional approaches to automatic dialect classification: it is impossible to go through all the transcriptions manually; these are not systematic word lists; and we should not only extract the frequency of some known features, as we might overlook features that are not yet discovered.
This paper employs topic modelling to automatically detect dialect groups in the southern Dutch dialects. This method is data-driven and can overcome the issues mentioned above. The result shows that southern Dutch dialects can be divided into 2 to 4 major groups, coinciding with the traditional classification (Taeldeman 2001).
