1887
Volume 49, Issue 1
  • ISSN 0155-0640
  • E-ISSN: 1833-7139
USD
Buy:$35.00 + Taxes

Abstract

This paper addresses issues related to the design and compilation of the first spoken corpus of youth talk in an under-represented language in corpus linguistics, Turkish. Designed to offer a maximally representative sample of Turkish youth talk, the Corpus of Turkish Youth Language (CoTY) is a 168,748-token specialised corpus within the single register of informal, naturally occurring and spontaneous interaction exclusively among friends. The speakers are Turkish-speaking youth aged 14 to 18 from diverse socio-economic backgrounds in Türkiye. In this paper, the issues that surfaced during corpus design and construction are presented, with a discussion and justification of the methodological choices in relation to the long-term project objectives. The corpus contributes to the field as a valuable resource and tool for cross-linguistic youth language research. As an overarching fundamental goal, the project also aims to expand on the cumulative linguistic and methodological knowledge in spoken corpus design and construction.

Loading

Article metrics loading...

/content/journals/10.1075/aral.25007.efe
2025-07-28
2026-05-15
Loading full text...

Full text loading...

References

  1. Adolphs, S., & Knight, D.
    (2010) Building a spoken corpus. InA. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp.38–52). Routledge. 10.4324/9780203856949.ch4
    https://doi.org/10.4324/9780203856949.ch4 [Google Scholar]
  2. Aijmer, K.
    (2020) That’s well good: A re-emergent intensifier in current British English. Journal of English Linguistics, 49(1), 18–38. 10.1177/0075424220979143
    https://doi.org/10.1177/0075424220979143 [Google Scholar]
  3. Ancarno, C.
    (2020) Corpus-assisted discourse studies. InA. de Fina & A. Georgakopoulou (Eds.), The Cambridge Handbook of Discourse Studies. Cambridge University Press. 10.1017/9781108348195.009
    https://doi.org/10.1017/9781108348195.009 [Google Scholar]
  4. Andersen, G.
    (1997) They like wanna see like how we talk and all that. The use of like as a discourse marker in London teenage speech. InM. Ljung (Ed.), Corpus-based studies in English (pp.37–48). Rodopi. 10.1163/9789004653641_005
    https://doi.org/10.1163/9789004653641_005 [Google Scholar]
  5. Androutsopoulos, J.
    (2007) Style online: Doing hip-hop on the German-speaking Web. InP. Auer (Ed.), Style and social identities: Alternative approaches to linguistic heterogeneity (pp.279–317). De Gruyter Mouton. 10.1515/9783110198508.2.279
    https://doi.org/10.1515/9783110198508.2.279 [Google Scholar]
  6. Baker, P. & Egbert, J.
    (Eds.) (2016) Triangulating Methodological Approaches in Corpus Linguistic Research. Routledge. 10.4324/9781315724812
    https://doi.org/10.4324/9781315724812 [Google Scholar]
  7. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E.
    (1999) The Longman Grammar of Spoken and Written English. Longman.
    [Google Scholar]
  8. Cheshire, J., Kerswill, P., Fox, S., & Torgersen, E.
    (2011) Contact, the feature pool and the speech community: The emergence of Multicultural London English. Journal of Sociolinguistics, 15(2), 151–96. 10.1111/j.1467‑9841.2011.00478.x
    https://doi.org/10.1111/j.1467-9841.2011.00478.x [Google Scholar]
  9. Dorleijn, M., Mous, M., & Nortier, J.
    (2015) Urban youth styles in Kenya and the Netherlands. InJ. Nortier & B. A. Svendsen (Eds.), Language, youth and identity in the 21st Century: Linguistic practices across urban spaces (pp.271–89), Cambridge University Press. 10.1017/CBO9781139061896.019
    https://doi.org/10.1017/CBO9781139061896.019 [Google Scholar]
  10. Dovchin, S., Pennycook, A., & Sultana, S.
    (2018) Popular culture, voice and linguistic diversity: Young adults on- and offline. Palgrave Macmillan. 10.1007/978‑3‑319‑61955‑2
    https://doi.org/10.1007/978-3-319-61955-2 [Google Scholar]
  11. Drange, E.-M.
    (2009) Anglicisms in the informal speech of Norwegian and Chilean adolescents. InA.-B. Stenström, & A. M. Jørgensen (Eds.), Youngspeak in a multilingual perspective (pp.161–75). John Benjamins. 10.1075/pbns.184.12dra
    https://doi.org/10.1075/pbns.184.12dra [Google Scholar]
  12. Drummond, R.
    (2016) (Mis)interpreting urban youth language: White kids sounding black?Journal of Youth Studies, 20(5), 640–60. 10.1080/13676261.2016.1260692
    https://doi.org/10.1080/13676261.2016.1260692 [Google Scholar]
  13. (2018) Maybe it’s a grime [t]ing. TH-stopping in urban British youth. Language in Society, 47(2), 171–96. 10.1017/S0047404517000999
    https://doi.org/10.1017/S0047404517000999 [Google Scholar]
  14. (2020) Teenage swearing in the UK. English World-wide, 41(1), 59–88. 10.1075/eww.00040.dru
    https://doi.org/10.1075/eww.00040.dru [Google Scholar]
  15. Enghels, R., De Latte, F., & Roels, L.
    (2020) El Corpus Oral de Madrid (CORMA): Materiales para el estudio (socio)lingüístico del español coloquial actual. Zeitschrift für Katalanistik, 331, 45–76. 10.46586/ZfK.2020.45‑76
    https://doi.org/10.46586/ZfK.2020.45-76 [Google Scholar]
  16. Flowerdew, L.
    (2008) Corpora and context in professional writing. InV. Bhatia, J. Flowerdew & R. H. Jones (Eds.), Advances in discourse studies (pp.115–127). Routledge.
    [Google Scholar]
  17. Georgakopoulou, A.
    (2008) ‘On MSN with buff boys’: Self- and other-identity claims in the context of small stories. Journal of Sociolinguistics, 12(5), 597–626. 10.1111/j.1467‑9841.2008.00384.x
    https://doi.org/10.1111/j.1467-9841.2008.00384.x [Google Scholar]
  18. Georgakopoulou, A., & Charalambidou, A.
    (2011) Doing age and ageing- language, discourse and social interaction. InA. Aijmer & G. Andersen (Eds.), Pragmatics of Society (pp.31–52). De Gruyter Mouton. 10.1515/9783110214420.31
    https://doi.org/10.1515/9783110214420.31 [Google Scholar]
  19. Goedertier, W., Goddijn, S., & Martens, J.
    (2000) Orthographic transcription of the Spoken Dutch Corpus. Proceedings of the 2nd International Conference on Language Resources & Evaluation (pp.909–14). European Language Resources Association.
    [Google Scholar]
  20. Groff, C., Hollington, A., Hurst-Harosh, E., Nassenstein, N., Nortier, J., Pasch, H. & Yannuar, N.
    (2022) Global Perspectives on Youth Language Practices. De Gruyter Mouton. 10.1515/9781501514685
    https://doi.org/10.1515/9781501514685 [Google Scholar]
  21. Harissi, M., Otsuji, E., & Pennycook, A.
    (2012) The Performative Fixing and Unfixing of Subjectivities, Applied Linguistics, 33(5), 524–543. 10.1093/applin/ams053
    https://doi.org/10.1093/applin/ams053 [Google Scholar]
  22. Hasund, I. K., & Stenström, A.-B.
    (1997) Conflict talk: A comparison of the verbal disputes of adolescent females in two corpora. InM. Ljung (Ed.), Corpus-based studies in English (pp.119–33), Rodopi. 10.1163/9789004653641_010
    https://doi.org/10.1163/9789004653641_010 [Google Scholar]
  23. Hunston, S.
    (2002) Corpora in applied linguistics. Cambridge University Press. 10.1017/CBO9781139524773
    https://doi.org/10.1017/CBO9781139524773 [Google Scholar]
  24. Ilbury, C.
    (2022a) Discourses of social media amongst youth: An ethnographic perspective. Discourse, Context, and Media, 481. 10.1016/j.dcm.2022.100625
    https://doi.org/10.1016/j.dcm.2022.100625 [Google Scholar]
  25. (2022b) U Ok Hun?: The digital commodification of white woman style. Journal of Sociolinguistics, 26(2), 159–164. 10.1111/josl.12563
    https://doi.org/10.1111/josl.12563 [Google Scholar]
  26. Jasanoff, S. S.
    (2003) Technologies of humility: Citizen participation in governing science. Minerva, 41(3), 223–44. 10.1023/A:1025557512320
    https://doi.org/10.1023/A:1025557512320 [Google Scholar]
  27. Jonsson, R.
    (2018) Swedes can’t swear: Making fun at a multiethnic secondary school. Journal of Language, Identity & Education, 17(5), 320–35. 10.1080/15348458.2018.1469412
    https://doi.org/10.1080/15348458.2018.1469412 [Google Scholar]
  28. Jørgensen, A. M.
    (2007) COLA: Un corpus oral de lenguaje adolescente. Orali, 31, 225–34.
    [Google Scholar]
  29. (2013) Spanish teenage language and the COLAm-corpus. Bergen Language and Linguistics Studies, 3(1), 151–66. 10.15845/bells.v3i1.368
    https://doi.org/10.15845/bells.v3i1.368 [Google Scholar]
  30. Jufri, S. & Sun, C.
    (2022) Keywords Analysis. v1.0. Australian Text Analytics Platform. Software. https://github.com/Australian-Text-Analytics-Platform/keywords-analysis
  31. Kerswill, P. & Williams, A.
    (2005) New towns and koineization: linguistic and social correlates. Linguistics, 43(5), 1023–1048. 10.1515/ling.2005.43.5.1023
    https://doi.org/10.1515/ling.2005.43.5.1023 [Google Scholar]
  32. Kilgarriff, A., Rundell, M., & Dhonnchadha, E. U.
    (2006) Efficient Corpus Development for Lexicography: Building the New Corpus for Ireland. Language Resources and Evaluation, 40(2), 127–152. www.jstor.org/stable/30200564
    [Google Scholar]
  33. Kotsinas, U.-B.
    (1988) Immigrant children’s Swedish: A new variety?Journal of Multilingual and Multicultural Development, 91, 129–140. 10.1080/01434632.1988.9994324
    https://doi.org/10.1080/01434632.1988.9994324 [Google Scholar]
  34. Love, R., & Stenström, A-B.
    (2023) Corpus-pragmatic perspectives on the contemporary weakening of fuck: The case of teenage British English conversation. Journal of Pragmatics, 2161, 167–181. 10.1016/j.pragma.2023.08.008
    https://doi.org/10.1016/j.pragma.2023.08.008 [Google Scholar]
  35. Love, R., Dembry, C., Hardie, A., Brezina, V. & McEnery, T.
    (2017) The Spoken BNC2014: designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics, 22(3), 319–344.
    [Google Scholar]
  36. MacWhinney, B., & Snow, C.
    (1985) The child language data exchange system. Journal of Child Language (12)21, 271–269. 10.1017/S0305000900006449
    https://doi.org/10.1017/S0305000900006449 [Google Scholar]
  37. Madsen, L. M.
    (2015) Fighters, girls and other identities: Sociolinguistics in a martial arts club. Multilingual Matters. 10.2307/jj.26932013
    https://doi.org/10.2307/jj.26932013 [Google Scholar]
  38. Marchi, A. & Taylor, C.
    (2018) Introduction. InC. Taylor and A. Marchi (Eds.) Corpus Approaches to Discourse: A Critical Review (pp.1–15). Routledge. 10.4324/9781315179346‑1
    https://doi.org/10.4324/9781315179346-1 [Google Scholar]
  39. McEnery, T., & Wilson, A.
    (2001) Corpus linguistics: An introduction. Edinburgh University Press.
    [Google Scholar]
  40. Moore, E.
    (2004) Sociolinguistic Style: A Multidimensional Resource for Shared Identity Creation. The Canadian Journal of Linguistics / La Revue Canadienne de Linguistique, 49(3), 375–396. 10.1017/S0008413100003558
    https://doi.org/10.1017/S0008413100003558 [Google Scholar]
  41. Nørreby, T. R., & Møller, J. S.
    (2015) Ethnicity and social categorization in on- and offline interaction among Copenhagen adolescents. Discourse, Context & Media, 81, 46–54. 10.1016/j.dcm.2015.05.006
    https://doi.org/10.1016/j.dcm.2015.05.006 [Google Scholar]
  42. Nortier, J.
    (2016) Characterizing urban youth speech styles in Utrecht and on the Internet. Journal of Language Contact, 9(1), 163–85. 10.1163/19552629‑00901007
    https://doi.org/10.1163/19552629-00901007 [Google Scholar]
  43. Nortier, J. & Svendsen, B.
    (Eds.) (2015) Language, Youth and Identity in the 21st Century: Linguistic Practices across Urban Spaces. Cambridge University Press. 10.1017/CBO9781139061896
    https://doi.org/10.1017/CBO9781139061896 [Google Scholar]
  44. Palacios Martínez, I. M.
    (2018) ‘Help me move to that, blood’. A corpus-based study of the syntax and pragmatics of vocatives in the language of British teenagers. Journal of Pragmatics, 1301, 33–50. 10.1016/j.pragma.2018.04.001
    https://doi.org/10.1016/j.pragma.2018.04.001 [Google Scholar]
  45. Palacios Martínez, I. M., & Núñez Pertejo, P.
    (2014) Strategies used by English and Spanish teenagers to intensify language: A contrastive corpus-based study. Spanish in Context, 11(2), 175–201. 10.1075/sic.11.2.02pal
    https://doi.org/10.1075/sic.11.2.02pal [Google Scholar]
  46. Partington, A.
    (2004) Corpora and discourse, a most congruous beast. InA. Partington, J. Morley & L. Haarman (Eds.) Corpora and Discourse (pp.11–20). Peter Lang.
    [Google Scholar]
  47. Pharao, N., Maegaard, M., Møller, J. S., & Kristiansen, T.
    (2014) Indexical meanings of [s+] among Copenhagen youth: Social perception of a phonetic variant in different prosodic contexts. Language in Society, 43(1), 1–31. 10.1017/S0047404513000857
    https://doi.org/10.1017/S0047404513000857 [Google Scholar]
  48. Quist, P., & Svendsen, B. A.
    (2010) Multilingual Urban Scandinavia: New linguistic practices. Multilingual Matters. 10.21832/9781847693143
    https://doi.org/10.21832/9781847693143 [Google Scholar]
  49. Rampton, B.
    (1995) Crossing: Language and ethnicity among adolescents. Longman.
    [Google Scholar]
  50. (2006) Language in late modernity: Interaction in an urban school. Cambridge University Press. 10.1017/CBO9780511486722
    https://doi.org/10.1017/CBO9780511486722 [Google Scholar]
  51. (2011) From ‘Multi-ethnic adolescent heteroglossia’ to ‘Contemporary urban vernaculars’. Language & Communication, 31(4), 276–94. 10.1016/j.langcom.2011.01.001
    https://doi.org/10.1016/j.langcom.2011.01.001 [Google Scholar]
  52. (2015) Contemporary urban vernaculars. InJ. Nortier, & B. A. Svendsen (Eds.), Language, youth and identity in the 21st century. Linguistic practices across urban spaces (pp.25–44). Cambridge University Press. 10.1017/CBO9781139061896.003
    https://doi.org/10.1017/CBO9781139061896.003 [Google Scholar]
  53. Rehbein, I., Schalowski, S., & Wiese, H.
    (2014) The KiezDeutsch Korpus (KiDKo) Release 1.0. InProceedings of the 9th International Conference on Language Resources and Evaluation (pp.367–375). European Language Resources Association.
    [Google Scholar]
  54. Rehbein, J., Schmidt, T., Meyer, B., Watzke, F. & Herkenrath, A.
    (2004) Handbuch für das computergestützte Transkribieren nach HIAT. In: Arbeiten zur Mehrsprachigkeit, Folge B (56). www1.uni-hamburg.de/exmaralda/files/azm_56.pdf
    [Google Scholar]
  55. Rodríguez-González, F., & Stenström, A.-B.
    (2011) Expressive devices in the language of English and Spanish-speaking youth. Revista Alicantina de Estudios Ingleses, 241, 235–56. 10.14198/raei.2011.24.10
    https://doi.org/10.14198/raei.2011.24.10 [Google Scholar]
  56. Roels, L., & Enghels, R.
    (2020) Age-based variation and patterns of recent language change: A case-study of morphological and lexical intensifiers in Spanish. Journal of Pragmatics, 1701, 125–38. 10.1016/j.pragma.2020.08.017
    https://doi.org/10.1016/j.pragma.2020.08.017 [Google Scholar]
  57. Roels, L., De Latte, F., & Enghels, R.
    (2021) Monitoring 21st-Century real-time language change in Spanish youth speech. Languages, 6(4), 162. 10.3390/languages6040162
    https://doi.org/10.3390/languages6040162 [Google Scholar]
  58. Roulet, E.
    (1980) Interactional markers in dialogue. Applied Linguistics, 1(3), 224–33. 10.1093/applin/1.3.224
    https://doi.org/10.1093/applin/1.3.224 [Google Scholar]
  59. Ruhi, Ş.
    (2013) Interactional markers in Turkish: A corpus based perspective. Journal of Linguistics and Literature, (10)21, 1–7.
    [Google Scholar]
  60. Ruhi, Ş., Hatipoğlu, Ç., Işık-Güler, H., & Eröz-Tuğa, B.
    (2010a) A guideline for transcribing conversations for the construction of Spoken Turkish Corpora using EXMARaLDA and HIAT. Setmer Yayıncılık.
    [Google Scholar]
  61. Ruhi, Ş., Hatipoğlu, Ç., Eröz-Tuğa, B., Işık-Güler, H., Acar, G., Eryılmaz, K., Can, H., Karakaş, Ö. and Çokal-Karadaş, D.
    (2010b, May). Sustaining a Corpus for Spoken Turkish Discourse: Accessibility and Corpus Management Issues. [Paper presentation] Language Resources: From Storyboard to Sustainability and LR Lifecycle Management Workshop, Malta.
    [Google Scholar]
  62. Rymes, B., & Leone, A. R.
    (2014) Citizen Sociolinguistics: A new media methodology for understanding language and social life. Working Papers in Educational Linguistics, 29(2), 25–43.
    [Google Scholar]
  63. Schmidt, T., & Wörner, K.
    (2014) EXMARaLDA. InT. Schmidt (Ed.), Handbook on Corpus Phonology (pp.402–19). Oxford University Press.
    [Google Scholar]
  64. Schneider, C., Brittan, D., Hodel, T., Hess, D. & Linder, A.
    (2021) JuBe — Jugendsprache Schweiz Korpus (1.0) [Data set]. Zenodo. 10.5281/zenodo.5648157
    https://doi.org/10.5281/zenodo.5648157 [Google Scholar]
  65. Selvi, A. F.
    (2011) World Englishes in the Turkish sociolinguistic context. World Englishes, 30(2), 182–199. 10.1111/j.1467‑971X.2011.01705.x
    https://doi.org/10.1111/j.1467-971X.2011.01705.x [Google Scholar]
  66. Shirk, J. L., Ballard, H. L., Wilderman, C. C., Phillips, T., Wiggins, A., Jordan, R., McCallie, E., Minarchek, M., Lewenstein, B. V., Krasny, M. E., & Bonney, R.
    (2012) Public participation in scientific research: A framework for deliberate design. Ecology and Society, 17(2). 10.5751/ES‑04705‑170229
    https://doi.org/10.5751/ES-04705-170229 [Google Scholar]
  67. Sinclair, J., & Coulthard, M.
    (1992) Towards an Analysis of Discourse. Oxford University Press.
    [Google Scholar]
  68. Sinclair, J.
    (1996) Corpus typology: Guidelines for encoding and documentation of linguistic corpora. https://www.ilc.cnr.it/EAGLES96/corpustyp/node12.html
    [Google Scholar]
  69. Stefanowitsch, A.
    (2020) Corpus linguistics: A guide to the methodology. Language Science Press.
    [Google Scholar]
  70. Steingrímsson, S., Helgadóttir, S., Rögnvaldsson, K., Barkarson, S., & Guðnason, J.
    (2018) Risamálheild: A Very Large Icelandic Text Corpus. InProceedings of the 11th International Conference on Language Resources and Evaluation (pp.4361–66). European Language Resources Association.
    [Google Scholar]
  71. Stenström, A.-B., Andersen, G., & Hasund, I. K.
    (2002) Trends in teenage talk. John Benjamins Publishing Company. 10.1075/scl.8
    https://doi.org/10.1075/scl.8 [Google Scholar]
  72. Stenström, A.-B.
    (1997) Can I have a chips please? — Just tell me what one you want: Nonstandard grammatical features in London teenage talk. InJ. Aarts (Ed.), Studies in English language and teaching (pp.141–52). Rodopi. 10.1163/9789004653351_011
    https://doi.org/10.1163/9789004653351_011 [Google Scholar]
  73. (1998) From sentence to discourse: Cos (because) in teenage talk. InA. Jucker & Y. Ziv (Eds.), Discourse markers: Descriptions and theory (pp.127–46). John Benjamins. 10.1075/pbns.57.08ste
    https://doi.org/10.1075/pbns.57.08ste [Google Scholar]
  74. (2005) It is very good eh– Está muy bien eh. Teenagers’ use of tags — London and Madrid compared. InK. McCafferty, Tove Bull, & K. Killie (Eds.), Contexts — historical, social, linguistic. Studies in celebration of Toril Swan (pp.279–91). Peter Lang.
    [Google Scholar]
  75. (2014) Teenage talk: From general characteristics to the use of pragmatic markers in a contrastive perspective. Palgrave Macmillan. 10.1057/9781137430380
    https://doi.org/10.1057/9781137430380 [Google Scholar]
  76. Stenström, A.-B., & Jørgensen, A. M.
    (2008) A question of politeness? A contrastive study of phatic language in teenage conversation. Pragmatics, 18(4), 636–57.
    [Google Scholar]
  77. Stenström, A.-B., Andersen, G., & Hasund, I.-K.
    (2002) Trends in Teenage Talk: Corpus compilation, analysis and findings. John Benjamins. 10.1075/scl.8
    https://doi.org/10.1075/scl.8 [Google Scholar]
  78. Stenström, A.-B., Andersen, G., Hasund, K., Monstad, K., & Aas, H.
    (1998) User’s Manual to Accompany The Bergen Corpus of London Teenage Language (COLT). University of Bergen.
    [Google Scholar]
  79. Strenström, A.-B.
    (2013) Youngspeak: Spanish vale and English okay. InK. Aijmer & B. Altenberg (Eds.), Advances in Corpus-based Contrastive Linguistics (pp.127–139). John Benjamins. 10.1075/scl.54.08ste
    https://doi.org/10.1075/scl.54.08ste [Google Scholar]
  80. Svendsen, B. A.
    (2018) The dynamics of citizen sociolinguistics. Journal of Sociolinguistics, 22(2), 137–60. 10.1111/josl.12276
    https://doi.org/10.1111/josl.12276 [Google Scholar]
  81. Svendsen, B. A., & Røyneland, U.
    (2008) Multiethnolectal facts and functions in Oslo, Norway. International Journal of Bilingualism, 12(1–2), 63–83. 10.1177/13670069080120010501
    https://doi.org/10.1177/13670069080120010501 [Google Scholar]
  82. TÜAD
    TÜAD (2012) Ses 2012 Lansman Toplantısı. TÜAD.
    [Google Scholar]
  83. TurkStat
    TurkStat (2023) Youth in Statistics. RetrievedApril 1, 2024. https://data.tuik.gov.tr/Bulten/Index?p=Youth-in-Statistics-2022-49670&dil=2
  84. Tüzün, S.
    (2000) Kentsel Türkiye hane ve bireyleri için bir tabakalaşma modeli olarak veri sosyo ekonomik statü indeksi. InF. Atacan, F. Ercan, H. Kurtuluş, & M. Türkay (Eds.), Mübeccel Kıray için yazılar (pp.371–85). Bağlam.
    [Google Scholar]
/content/journals/10.1075/aral.25007.efe
Loading
/content/journals/10.1075/aral.25007.efe
Loading

Data & Media loading...

  • Article Type: Research Article
Keyword(s): corpus construction; corpus design; spoken corpus; Turkish; youth talk
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error