Volume 18, Issue 2
  • ISSN 0257-3784
  • E-ISSN: 2212-9731



The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patterns expressing a transitive event (active transitive and suffixal passive). As the first empirical report on NLP-assisted analysis of Korean child corpora, this study is expected to reveal its advantages and drawbacks, thereby opening the window to furthering corpus-mediated research on child language development in Korean. Implications of this study’s findings will also contribute to research practice regarding developmental studies on Korean through child corpora, ensuring the reproducibility of procedures and results, which is often lacking in previous corpus-based research on child language development in Korean.

Available under the CC BY-NC 4.0 license.

Article metrics loading...

Loading full text...

Full text loading...



  1. Abbot-Smith, Kirsten, Franklin Chang, Caroline Rowland, Heather Ferguson & Julian Pine
    2017 Do two and three year old children use an incremental first-NP-as-agent bias to process active transitive and passive sentences?: A permutation analysis. PloS one12.10. e0186129. 10.1371/journal.pone.0186129
    https://doi.org/10.1371/journal.pone.0186129 [Google Scholar]
  2. Alishahi, Afra & Suzanne Stevenson
    2008 A computational model of early argument structure acquisition. Cognitive Science32.5. 789–834. 10.1080/03640210801929287
    https://doi.org/10.1080/03640210801929287 [Google Scholar]
  3. Allan, Lorraine G.
    1980 A note on measurement of contingency between two binary variables in judgment tasks. Bulletin of the Psychonomic Society15.3. 147–149. 10.3758/BF03334492
    https://doi.org/10.3758/BF03334492 [Google Scholar]
  4. Ambridge, Ben, Evan Kidd, Caroline F. Rowland & Anna L. Theakston
    2015 The ubiquity of frequency effects in first language acquisition. Journal of Child Language42.2. 239–273. 10.1017/S030500091400049X
    https://doi.org/10.1017/S030500091400049X [Google Scholar]
  5. Behrens, Heike
    2006 The input-output relationship in first language acquisition. Language and Cognitive Processes21.1–3. 2–24. 10.1080/01690960400001721
    https://doi.org/10.1080/01690960400001721 [Google Scholar]
  6. 2009 Usage-based and emergentist approaches to language acquisition. Linguistics47.2. 383–411. 10.1515/LING.2009.014
    https://doi.org/10.1515/LING.2009.014 [Google Scholar]
  7. Cameron-Faulkner, Thea, Elena Lieven & Michael Tomasello
    2003 A construction based analysis of child directed speech. Cognitive Science27.6. 843–873. 10.1207/s15516709cog2706_2
    https://doi.org/10.1207/s15516709cog2706_2 [Google Scholar]
  8. Cameron-Faulkner, Thea, Elena Lieven & Anna Theakston
    2007 What part of no do children not understand? A usage-based account of multiword negation. Journal of Child Language34.2. 251–282. 10.1017/S0305000906007884
    https://doi.org/10.1017/S0305000906007884 [Google Scholar]
  9. Cho, Sook Whan
    1982 The acquisition of word order in Korean. MA thesis, University of Calgary.
    [Google Scholar]
  10. Choi, Soonja
    1999 Early development of verb structures and caregiver input in Korean: Two case studies. International Journal of Bilingualism3.2–3. 241–265. 10.1177/13670069990030020701
    https://doi.org/10.1177/13670069990030020701 [Google Scholar]
  11. Choi, Jinho D. & Martha Palmer
    2011 Statistical dependency parsing in Korean: From corpus generation to automatic parsing. InProceedings of the second workshop on statistical parsing of morphologically rich languages, 1–11.
    [Google Scholar]
  12. Choo, Miho & Kwak, Hye-Young
    2008Using Korean. Cambridge: Cambridge University Press. 10.1017/CBO9781139168496
    https://doi.org/10.1017/CBO9781139168496 [Google Scholar]
  13. Chung, Gyeonghee No
    1994 Case and its acquisition in Korean. Ph.D. dissertation, University of Texas at Austin.
  14. Collins, Michael & Nigel Duffy
    2002 New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. InProceedings of the 40th annual meeting on association for computational linguistics, 263–270.
    [Google Scholar]
  15. Dąbrowska, Ewa
    2008 The effects of frequency and neighbourhood density on adult speakers’ productivity with Polish case inflections: An empirical test of usage-based approaches to morphology. Journal of Memory and Language58.4. 931–951. 10.1016/j.jml.2007.11.005
    https://doi.org/10.1016/j.jml.2007.11.005 [Google Scholar]
  16. Daumé III, Hal
    2015 A Course in machine learning (Ch3. The perceptron). ciml.info/
  17. Desagulier, Guillaume
    2016 A lesson from associative learning: asymmetry and productivity in multiple-slot constructions. Corpus Linguistics and Linguistic Theory12.2. 173–219. 10.1515/cllt‑2015‑0012
    https://doi.org/10.1515/cllt-2015-0012 [Google Scholar]
  18. Dittmar, Miriam, Kirsten Abbot-Smith, Elena Lieven & Michael Tomasello
    2008 German children’s comprehension of word order and case marking in causative sentences. Child Development79.4. 1152–1167. 10.1111/j.1467‑8624.2008.01181.x
    https://doi.org/10.1111/j.1467-8624.2008.01181.x [Google Scholar]
  19. Ellis, Nick. C.
    2002 Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition24. 143–188. 10.1017/S0272263102002024
    https://doi.org/10.1017/S0272263102002024 [Google Scholar]
  20. Ellis, Nick C. & Fernando Ferreira-Junior
    2009 Construction learning as a function of frequency, frequency distribution, and function. The Modern Language Journal93.3. 370–385. 10.1111/j.1540‑4781.2009.00896.x
    https://doi.org/10.1111/j.1540-4781.2009.00896.x [Google Scholar]
  21. Freund, Yoav & Robert E. Schapire
    1999 Large margin classification using the perceptron algorithm. Machine Learning37.3. 277–296. 10.1023/A:1007662407062
    https://doi.org/10.1023/A:1007662407062 [Google Scholar]
  22. Ghosh, Devyani, John B. Carter & Hal Daumé III
    2008 Perceptron-based Coherence Predictors. InProceedings of the 2nd Workshop on chip multiprocessor memory systems and interconnects.
    [Google Scholar]
  23. Goldberg, Adele E., Devin M. Casenhiser & Nitya Sethuraman
    2004 Learning argument structure generalizations. Cognitive Linguistics15.3. 289–316. 10.1515/cogl.2004.011
    https://doi.org/10.1515/cogl.2004.011 [Google Scholar]
  24. Honnibal, Matthew
    2013 A good part-of-speech tagger in about 200 lines of Python. https://explosion.ai/blog/part-of-speech-pos-tagger-in-python
  25. Honnibal, Matthew, Yoav Goldberg & Mark Johnson
    2013 A non-monotonic arc-eager transition system for dependency parsing. InProceedings of the 7th Conference on Computational Natural Language Learning, 163–172.
    [Google Scholar]
  26. Honnibal, Matthew & Mark Johnson
    2015 An improved non-monotonic transition system for dependency parsing. InProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1373–1378. 10.18653/v1/D15‑1162
    https://doi.org/10.18653/v1/D15-1162 [Google Scholar]
  27. Huang, Yi Ting, Xiaobei Zheng, Xiangzhi Meng & Jesse Snedeker
    2013 Children’s assignment of grammatical roles in the online processing of Mandarin passive sentences. Journal of Memory and Language69.4. 589–606. 10.1016/j.jml.2013.08.002
    https://doi.org/10.1016/j.jml.2013.08.002 [Google Scholar]
  28. Jin, Kyong-sun, Min Ju Kim & Hyun-joo Song
    2015 The development of Korean preschooler’ ability to understand transitive sentences using case-markers. The Korean Journal of Cognitive and Biological Psychology28.3. 75–90.
    [Google Scholar]
  29. Kim, Hung-gyu, Beom-mo Kang & Jungha Hong
    2007 21seyki seycongkyeyhoyk hyentaykwuke kichomalmwungchi sengkwawa cenmang [21st century Sejong modern Korean corpora: Results and expectations]. InProceedings of annual conference on human and language technology31, 311–316.
    [Google Scholar]
  30. Kim, Meesook
    2010 Syntactic priming in children’s production of passives. Korean Journal of Applied Linguistics26.2. 271–290.
    [Google Scholar]
  31. Kim, Seongchan, William O’Grady & Sookeun Cho
    1995 The acquisition of case and word order in Korean: A note on the role of context. Language Research31.4. 687–695.
    [Google Scholar]
  32. Kim, Shin-Young, Jee Eun Sung & Dongsun Yim
    2017 Sentence comprehension ability and working memory capacity as a function of syntactic structure and canonicity in 5-and 6-year-old children. Communication Sciences & Disorders22.4. 643–656. 10.12963/csd.17420
    https://doi.org/10.12963/csd.17420 [Google Scholar]
  33. Kim, Wansu & Cheol Young Ock
    2015 hankwuke kyekthul sacenkwa uymiyek pinto cengpolul sayonghan hankwuke uymiyek kyelceng [Korean semantic role labeling using case frame and frequency]. Journal of Korean Institute of Information Technology11.2. 161–167.
    [Google Scholar]
  34. Lee, Chungmin & Sook Whan Cho
    2009 Acquisition of the subject and topic nominals and markers in the spontaneous speech of young children in Korean. InThe Handbook of East Asian Psycholinguistics3ed byChungmin Lee, Greg Simpson and Youngjin Kim, 23–33. New York, NY: Cambridge University Press. 10.1017/CBO9780511596865.003
    https://doi.org/10.1017/CBO9780511596865.003 [Google Scholar]
  35. Lee, Hee Ran
    2004 2sey hankwuk atonguy cwue paltal thukseng [A study of early subject acquisition in Korean]. Communication Sciences and Disorders9.2. 19–32.
    [Google Scholar]
  36. Lee, Ikseop
    2011kwukehakkaysel [Introduction to Korean linguistics]. Seoul: Hakyensa.
    [Google Scholar]
  37. Lee, Sun-Ar & Jin-Tak Choi
    2013 hankwuke Verb_OntoNetuy selkyeywa kwuchwuk [Design and implementation of Korean Verb_OntoNet]. Journal of Korean Institute of Information Technology11.2. 161–167.
    [Google Scholar]
  38. MacWhinney, Brian
    2000The CHILDES Project: Tools for analyzing talk. Third Edition. Mahwah, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  39. No, Gyeong Hee
    2009 Acquisition of case markers and grammatical functions. InThe Handbook of East Asian Psycholinguistics3ed byChungmin Lee, Greg Simpson and Youngjin Kim, 23–33. New York, NY: Cambridge University Press. 10.1017/CBO9780511596865.005
    https://doi.org/10.1017/CBO9780511596865.005 [Google Scholar]
  40. Park, Jungyeul, Jeen-Pyo Hong & Jeong-Won Cha
    2016 Korean language resources for everyone. InJProceedings of the 30th Pacific Asia conference on language, information and computation: Oral Papers, 49–58.
    [Google Scholar]
  41. Petrov, Slav, Dipanjan Das & Ryan McDonald
    2012 A universal part-of-speech tagset. InProceedings of the 8th International Conference on Language Resources and Evaluation, 2089–2096.
    [Google Scholar]
  42. Qi, Peng, Timothy Dozat, Yuhao Zhang & Christopher D. Manning
    2018 Universal dependency parsing from scratch. InProceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 160–170. 10.18653/v1/K18‑2016
    https://doi.org/10.18653/v1/K18-2016 [Google Scholar]
  43. Rosenblatt, Frank
    1958 The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review65.6. 386–408. 10.1037/h0042519
    https://doi.org/10.1037/h0042519 [Google Scholar]
  44. Shin, Gyu-Ho
    2020 Connecting input to comprehension: First language acquisition of active transitives and suffixal passives by Korean-speaking preschool children. Ph.D. dissertation, University of Hawai‘i at Mānoa.
  45. Shin, Seo-in
    2006 kwumwun pwunsek malmwungchilul iyonghan hankwuke mwunhyeng yenkwu [A study on Korean sentence patterns using a parsed corpus]. Ph.D. dissertation, Seoul National University.
  46. Sinclair, Hermina & Jean-Paul Bronckart
    1972 S.V.O A linguistic universal? A study in developmental psycholinguistics. Journal of Experimental Child Psychology14. 329–348. 10.1016/0022‑0965(72)90055‑0
    https://doi.org/10.1016/0022-0965(72)90055-0 [Google Scholar]
  47. Slobin, Dan I. & Thomas G. Bever
    1982 Children use canonical sentence schemas: A crosslinguistic study of word order and inflections. Cognition12.3. 229–265. 10.1016/0010‑0277(82)90033‑6
    https://doi.org/10.1016/0010-0277(82)90033-6 [Google Scholar]
  48. Sohn, Ho Min
    1999The Korean language. Cambridge University Press.
    [Google Scholar]
  49. Song, Sanghoun & Jae-Woong Choe
    2007 Type hierarchies for passive forms in Korean. InProceedings of the 14th international conference on Head-Driven Phrase Structure Grammar, Stanford Department of Linguistics and CSLI’s LinGO Lab, 250–270. 10.21248/hpsg.2007.15
    https://doi.org/10.21248/hpsg.2007.15 [Google Scholar]
  50. Stefanowitsch, Anatol
    2011 Constructional preemption by contextual mismatch: A corpus-linguistic investigation. Cognitive Linguistics22.1. 107–129. 10.1515/cogl.2011.005
    https://doi.org/10.1515/cogl.2011.005 [Google Scholar]
  51. Stoll, Sabine, Kirsten Abbot-Smith & Elena Lieven
    2009 Lexically restricted utterances in Russian, German, and English child-directed speech. Cognitive Science33.1. 75–103. 10.1111/j.1551‑6709.2008.01004.x
    https://doi.org/10.1111/j.1551-6709.2008.01004.x [Google Scholar]
  52. Straka, Milan & Jana Straková
    2017 Tokenizing, POS Tagging, lemmatizing and parsing UD 2.0 with UDPipe. InProceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 88–99. 10.18653/v1/K17‑3009
    https://doi.org/10.18653/v1/K17-3009 [Google Scholar]
  53. Theakston, Anna L.
    2004 The role of entrenchment in children’s and adults’ performance on grammaticality judgment tasks. Cognitive Development19.1. 15–34. 10.1016/j.cogdev.2003.08.001
    https://doi.org/10.1016/j.cogdev.2003.08.001 [Google Scholar]
  54. Tomasello, Michael
    1992First verbs: A case study of early grammatical development. New York, NY: Cambridge University Press. 10.1017/CBO9780511527678
    https://doi.org/10.1017/CBO9780511527678 [Google Scholar]
  55. 2003Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press.
    [Google Scholar]
  56. Wonnacott, Elizabeth, Jeremy K. Boyd, Jennifer Thomson & Adele E. Goldberg
    2012 Input effects on the acquisition of a novel phrasal construction in 5 year olds. Journal of Memory and Language66.3. 458–478. 10.1016/j.jml.2011.11.004
    https://doi.org/10.1016/j.jml.2011.11.004 [Google Scholar]
  57. Yeon, Jaehoon
    2015 Passives. InThe handbook of Korean linguisticsed byLucien Brown & Jaehoon Yeon, 116–136. Oxford: John Wiley & Sons. 10.1002/9781118371008.ch7
    https://doi.org/10.1002/9781118371008.ch7 [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): caregiver input; child production; Natural Language Processing
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error