Volume 27, Issue 1
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



This paper presents the Sociolinguistic Speech Corpus of Chilean Spanish (COSCACH) v1.0, a 9.3-million-word corpus containing transcribed, lemmatized and morphologically tagged text, audio recordings and videos from 1,237 L1 speakers of Chilean Spanish, as well as a control sample of 21 non-Chilean L1 Spanish speakers. The COSCACH is the first freely available corpus of spoken Chilean Spanish of substantial size, as well as one of the largest speech corpora of any variety of Spanish. Following a review of other Chilean speech corpora, I describe how the COSCACH was constructed, covering corpus design, speaker recruitment and metadata collection, speech elicitation and recording, transcription, lemmatization and morphological tagging, and corpus compilation. I thereby aim to provide a blueprint for creating modern, large-scale speech corpora suitable for phonetic, sociophonetic and sociolinguistic research, in addition to traditional inquiry into semantics, lexis, grammar, pragmatics and discourse.


Article metrics loading...

Loading full text...

Full text loading...


  1. Academia Chilena de la Lengua (Ed.)
    Academia Chilena de la Lengua (Ed.) (2010) Diccionario de uso del español de Chile (DUECh) [Dictionary of Chileanisms (DUECh)]. MN Editorial / Asociación de Academias de la Lengua Española / Gobierno de Chile / Consejo Nacional de la Cultura y las Artes.
    [Google Scholar]
  2. Audacity Development Team
    Audacity Development Team (2018) Audacity: Free Audio Editor and Recorder (2.3.0) [Computer software]. www.audacityteam.org/
    [Google Scholar]
  3. Audix Microphones
    Audix Microphones (2017) Audix HT5 Spec Sheet, version 4.1. https://web.archive.org/web/20150329040350/www.audixusa.com/docs_12/specs_pdf/HT5.pdf
    [Google Scholar]
  4. Bengoa, J.
    (2018) La comunidad fragmentada: Nación y desigualdad en Chile [The Fragmented Community: Nation and Inequality in Chile]. Editorial Catalonia.
    [Google Scholar]
  5. Boersma, P., & Weenink, D.
    (2018) Praat: Doing phonetics by computer (6.0.42) [Computer software]. www.praat.org/
    [Google Scholar]
  6. Evert, S., & Hardie, A.
    (2011) Twenty-first century corpus workbench: Updating a query architecture for the new millennium. InProceedings of the Corpus Linguistics 2011 Conference. https://eprints.lancs.ac.uk/id/eprint/62721/1/Paper_153.pdf
    [Google Scholar]
  7. Eyheramendy, S., Martinez, F. I., Manevy, F., Vial, C., & Repetto, G. M.
    (2015) Genetic structure characterization of Chileans reflects historical immigration patterns. Nature Communications, 6, 6472.   10.1038/ncomms7472
    https://doi.org/10.1038/ncomms7472 [Google Scholar]
  8. Fant, L., & Harvey, A.
    (2008) Intersubjetividad y consenso en el diálogo: Análisis de un episodio de trabajo en grupo estudiantil [Intersubjectivity and consensus in dialog: Analysis of a student group work session]. Oralia, 11, 307–322.
    [Google Scholar]
  9. Fernández de Molina Ortés, E.
    (2017) Estudio contrastivo de la norma culta de tres ciudades peninsulares. Análisis del campo semántico de la vivienda [A contrastive study of educated speech in three Spanish cities: Analysis of the semantic field of housing]. Onomázein, 37, 90–111.   10.7764/onomazein.37.09
    https://doi.org/10.7764/onomazein.37.09 [Google Scholar]
  10. Garretón, M. A., & Cumsille, G.
    (2002) Las percepciones de la desigualdad en Chile [Perceptions of inequality in Chile]. Revista Proposiciones, 34, 1–9.
    [Google Scholar]
  11. Gille, J.
    (2015) On the development of the Chilean Spanish discourse marker “cachái.” Revue Romane, 50(1), 3–29.   10.1075/rro.50.1.01gil
    https://doi.org/10.1075/rro.50.1.01gil [Google Scholar]
  12. Gundermann, H., Caniguan, J., Clavería, A., & Faúndez, C.
    (2009) Permanencia y desplazamiento, hipótesis acerca de la vitalidad del mapuzugun [Persistence and displacement: A hypothesis on the vitality of Mapudungun]. Revista de Lingüística Teórica y Aplicada, 47(1), 37–60.   10.4067/S0718‑48832009000100003
    https://doi.org/10.4067/S0718-48832009000100003 [Google Scholar]
  13. HandBrake Team
    HandBrake Team (2019) HandBrake (1.2.0) [Computer software]. https://handbrake.fr/
    [Google Scholar]
  14. Hardie, A.
    (2012) CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3), 380–409. 10.1075/ijcl.17.3.04har
    https://doi.org/10.1075/ijcl.17.3.04har [Google Scholar]
  15. Heggarty, P., Maguire, W., & McMahon, A.
    (2010) Splits or waves? Trees or webs? How divergence measures and network analysis can unravel language histories. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1559), 3829–3843.   10.1098/rstb.2010.0099
    https://doi.org/10.1098/rstb.2010.0099 [Google Scholar]
  16. Heggarty, P., Shimelman, A., Abete, G., Anderson, C., Sadowsky, S., Paschen, L., Maguire, W., Jocz, L., Aninao, M. J., Wägerle, L., Appelganz, D., Pheula do Couto e Silva, A., Lawyer, L. C., Câmara Cabral, A. S. A., Walworth, M., Michalsky, J., Koile, E., Runge, J., & Bibiko, H.-J.
    (2019) Sound Comparisons: A new online database and resource for research in phonetic diversity. InS. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS), Melbourne, Australia 2019 (pp.280–284). Australasian Speech Science and Technology Association. intro2psycholing.net/ICPhS/papers/ICPhS2019_Proceedings.pdf
    [Google Scholar]
  17. Instituto Nacional de Estadísticas
    Instituto Nacional de Estadísticas (2018) 1.2 Población total por sexo y área urbana-rural, según grupos de edad [1.2 Total population by sex and urban/rural provenance, by age group]. InSegunda Entrega de Resultados Censo 2017 [Second Report on the Results of the 2017 Census]. Instituto Nacional de Estadísticas. resultados.censo2017.cl/download/1_2_POBLACION.xls
    [Google Scholar]
  18. Jørgensen, A. M.
    (n.d.). Corpus Oral de Lenguaje Adolescente (COLA) [Adolescent Spoken Language Corpus (COLA)]. RetrievedDecember 23, 2021, fromhttps://blogg.hiof.no/colam-esp/
    [Google Scholar]
  19. Labov, W.
    (2001) Principles of Linguistic Change, vol. 2: Social Factors. Blackwell.
    [Google Scholar]
  20. (2006) The Social Stratification of English in New York City (2nd ed.). Cambridge University Press. 10.1017/CBO9780511618208
    https://doi.org/10.1017/CBO9780511618208 [Google Scholar]
  21. Li, M., Song, Q., Li, K., Hao, Y., & Chen, X.
    (2015) Definition of corpus, scripts, standards and specifications of recording device, environment/speaker coverage for Spanish language, version 1.1 (Technical Report King-ASR-290). SpeechOcean China.
    [Google Scholar]
  22. Milroy, L.
    (1987) Language and Social Networks (2nd ed.). Blackwell.
    [Google Scholar]
  23. Padró, L., & Stanilovsky, E.
    (2012) FreeLing 3.0: Towards Wider Multilinguality. Proceedings of the Language Resources and Evaluation Conference (LREC 2012).
    [Google Scholar]
  24. Rabanales, A.
    (1995) El estudio del habla culta de Santiago de Chile (1967–1993) [The study of educated speech in Santiago, Chile (1967–1993)]. Thesaurus, 50(1–3), 51–68.
    [Google Scholar]
  25. Rabanales, A., & Contreras, L.
    (1979) El habla culta de Santiago de Chile: Materiales para su estudio, tomo I [Materials for Studying Educated Speech in Santiago, Chile, vol. 1]. Anejo No 2 del Boletín de Filología. Editorial Universitaria.
    [Google Scholar]
  26. (1990) El habla culta de Santiago de Chile: Materiales para su estudio, tomo II [Materials for Studying Educated Speech in Santiago, Chile, vol. 2]. Instituto Caro y Cuervo.
    [Google Scholar]
  27. Real Academia Española
    Real Academia Española. (n.d.-a). Corpus de Referencia del Español Actual (CREA) [Contemporary Spanish Reference Corpus (CREA)]. RetrievedAugust 28, 2019, fromhttps://www.rae.es/recursos/banco-de-datos/crea
    [Google Scholar]
  28. Real Academia Española
    Real Academia Española. (n.d.-b). Corpus del Español del Siglo XXI [Corpus of 21st Century Spanish]. RetrievedAugust 28, 2019, fromhttps://www.rae.es/recursos/banco-de-datos/corpes-xxi
    [Google Scholar]
  29. Rogers, B.
    (2016) When Theory and Reality Collide: Exploring Chilean Spanish Intonational Plateaus [Ph.D. dissertation, University of Minnesota]. The University of Minnesota Digital Conservancy. conservancy.umn.edu/handle/11299/181656
    [Google Scholar]
  30. Ruiz-Tagle, J.
    (2016) La persistencia de la segregación y la desigualdad en barrios socialmente diversos: Un estudio de caso en La Florida, Santiago [The persistence of segregation and inequality in socially diverse neighborhoods: A case study from Santiago’s La Florida municipality]. EURE (Santiago), 42(125), 81–108.   10.4067/S0250‑71612016000100004
    https://doi.org/10.4067/S0250-71612016000100004 [Google Scholar]
  31. Sadowsky, S.
    (2016) FreeLing_es-CL: Chilean Spanish version of the FreeLing tagger [Computer software]. https://github.com/Linguista/FreeLing-es_CL
    [Google Scholar]
  32. (2017) MaSCoT-R: The Massive Speech Corpus Tool, Recursive Version (3.2) [Computer software]. https://github.com/Linguista/MaSCoT-R
    [Google Scholar]
  33. (2020) Español con (otros) sonidos araucanos: La influencia del mapudungun en el sistema vocálico del castellano chileno [Spanish with (other) Araucanian sounds: The influence of Mapudungun on the Chilean Spanish vowel system]. Boletín de Filología, 55(2), 33–75.   10.4067/S0718‑93032020000200033
    https://doi.org/10.4067/S0718-93032020000200033 [Google Scholar]
  34. (2021) EMIS: Sistema de estratificación socioeconómica para la investigación lingüística [EMIS: A socioeconomic stratification system for linguistic research]. InB. M. A. Rogers & M. Figueroa Candia (Eds.), Lingüística del castellano chileno: Estudios sobre variación, innovación, contacto e identidad [Chilean Spanish Linguistics: Studies on Variation, Innovation, Contact, and Identity] (pp.367–396). Vernon Press. https://vernonpress.com/book/606
    [Google Scholar]
  35. Sadowsky, S., & Aninao, M. J.
    (2019) Internal Migration and Ethnicity in Santiago. InA. Lynch (Ed.), The Routledge Handbook of Spanish in the Global City (pp.277–311). Routledge.   10.4324/9781315716350‑10
    https://doi.org/10.4324/9781315716350-10 [Google Scholar]
  36. Sadowsky, S., & Salamanca, G.
    (2011) El inventario fonético del español de Chile: Principios orientadores, inventario provisorio de consonantes y sistema de representación (AFI-CL) [The phonetic inventory of Chilean Spanish: guiding principles, provisional consonant inventory and system of representation (AFI-CL)]. Onomázein, 24(2), 61–84. onomazein.letras.uc.cl/Articulos/24/3_Sadowsky.pdf
    [Google Scholar]
  37. San Martín, A., & Guerrero, S.
    (2015) Estudio Sociolingüístico del Español de Chile (ESECH): Recogida y estratificación del corpus de Santiago [Sociolinguistic Study of Chilean Spanish (ESECH): Collection and stratification of the Santiago Corpus]. Boletín de Filología, 50(1), 221–247.   10.4067/S0718‑93032015000100009
    https://doi.org/10.4067/S0718-93032015000100009 [Google Scholar]
  38. San Martín, A., Guerrero, S., & Rojas, C.
    (2016) PRESEEA-SA: Corpus de Santiago de Chile. Proyecto para el Estudio Sociolingüístico del Español de España y América (PRESEEA) [PRESEEA-SA: The Santiago, Chile Corpus. Project for the Sociolinguistic Study of Iberian and American Spanish (PRESEEA)]. Universidad de Chile.
    [Google Scholar]
  39. Trudgill, P.
    (1974) Linguistic change and diffusion: Description and explanation in sociolinguistic dialect geography. Language in Society, 3, 215–246.   10.1017/S0047404500004358
    https://doi.org/10.1017/S0047404500004358 [Google Scholar]
  40. Zúñiga, F.
    (2007) Mapudunguwelaymi am? ‘¿Acaso ya no hablas mapudungun?’ [Mapudunguwelaymi am? ‘By chance do you not speak Mapudungun anymore?’]. Estudios Públicos, 105, 9–24.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error