Volume 23, Issue 4
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



This short paper introduces BasiScript, a 9-million-word corpus of contemporary Dutch texts written by primary school children. The data were collected over three years with 17,216 children contributing texts throughout this period. Each word token in the corpus is annotated with the correct orthographical form, the associated lemma and the part of speech. The most frequent polysemous words have been annotated for word meaning, while all words in the lexicon that was derived from the BasiScript corpus have been annotated for corpus and subcorpora frequency, dispersion, length, family size, family frequency, orthographic neighborhood size, and orthographic neighborhood frequency. Images of the texts are available to researchers. The present article describes the corpus and presents a comparison of BasiScript with BasiLex (a Dutch corpus with texts primary school children are likely to read, completed in 2015) by means of frequency profiling.


Article metrics loading...

Loading full text...

Full text loading...


  1. Balota, D., Yap, M., & Cortese, M. J.
    (2006) Visual word recognition. InM. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of Psycholinguistics (pp.285–376). Amsterdam: Elsevier Academic Press. 10.1016/B978‑012369374‑7/50010‑9
    https://doi.org/10.1016/B978-012369374-7/50010-9 [Google Scholar]
  2. Bracken, S., & Fischel, J. E.
    (2008) Family reading behaviour and literacy skills in preschool children from low-income backgrounds. Early Education and Development, 19(1), 45–67. 10.1080/10409280701838835
    https://doi.org/10.1080/10409280701838835 [Google Scholar]
  3. Chiu, S. I., Hong, F. Y., & Hu, H. Y.
    (2015) The effects of family cultural capital and reading motivation on reading behaviour in elementary school students. School Psychology International, 36(1), 3–17. 10.1177/0143034314528488
    https://doi.org/10.1177/0143034314528488 [Google Scholar]
  4. Clark, C., & Teravainen, A.
    (2017) Book Ownership and Reading Outcomes. London: National Literacy Trust.
    [Google Scholar]
  5. Drijbooms, E., Groen, M., & Verhoeven, L.
    (2017) How executive functions predict development in syntactic complexity of narrative writing in the upper elementary grades. Reading & Writing, 30(1), 209–231. 10.1007/s11145‑016‑9670‑8
    https://doi.org/10.1007/s11145-016-9670-8 [Google Scholar]
  6. Evers-Vermeul, J., & Sanders, T.
    (2009) The emergence of Dutch connectives; how cumulative cognitive complexity explains the order of acquisition. Journal of Child Language, 36(4), 829–854. 10.1017/S0305000908009227
    https://doi.org/10.1017/S0305000908009227 [Google Scholar]
  7. Fayol, M., & Mouchon, S.
    (1997) Production and comprehension of connectives in the written modality: A study of written French. InC. Pontecorvo (Ed.), Writing Development: An Interdisciplinary View (pp.193–204). Amsterdam/Philadelphia, PA: John Benjamins. 10.1075/swll.6.15fay
    https://doi.org/10.1075/swll.6.15fay [Google Scholar]
  8. Johannes, K., Wilson, C., & Landau, B.
    (2016) The importance of lexical verbs in the acquisition of spatial prepositions: The case of in and on. Cognition, 157, 174–189. 10.1016/j.cognition.2016.08.022
    https://doi.org/10.1016/j.cognition.2016.08.022 [Google Scholar]
  9. Kent, S., & Wanzek, J.
    (2016) The relationship between component skills and writing quality and production across developmental levels: A meta-analysis of the last 25 years. Review of Educational Research, 86(2), 570–601. 10.3102/0034654315619491
    https://doi.org/10.3102/0034654315619491 [Google Scholar]
  10. Meints, K., Plunkett, K., Harris, P. L., & Dimmock, D.
    (2002) What is ‘on’ and ‘under’ for 15-, 18-, and 24-month-olds? Typicality effects in early comprehension of spatial prepositions. British Journal of Developmental Psychology, 20(1), 113–130. 10.1348/026151002166352
    https://doi.org/10.1348/026151002166352 [Google Scholar]
  11. Penning de Vries, B., & Tellings, A.
    (forthcoming). Development of connective frequency in Dutch child-directed texts: a corpus analysis.
    [Google Scholar]
  12. Perfetti, C. A., & Hart, L.
    (2001) The lexical quality hypothesis. InL. Verhoeven, C. Elbro, & P. Reitsma (Eds.), Precursors of Functional Literacy (pp.189–214). Amsterdam/Philadelphia, PA: John Benjamins.
    [Google Scholar]
  13. Peterson, C., & McCabe, A.
    (1987) The connective “and”: Do older children use it less as they learn other connectives?Journal of Child Language, 14(2), 375–381. 10.1017/S0305000900012988
    https://doi.org/10.1017/S0305000900012988 [Google Scholar]
  14. Rayson, P., & Garside, R.
    (2000) Comparing Corpora using Frequency Profiling. InProceedings of the workshop on Comparing Corpora, 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6. Hong Kong.
    [Google Scholar]
  15. Tellings, A., Hulsbosch, M., Vermeer, A., & van den Bosch, A.
    (2014) BasiLex: An 11.5 million word corpus of Dutch texts written for children. Computational Linguistics in the Netherlands, 4, 191–208.
    [Google Scholar]
  16. Van den Bosch, A., Busser, G. J., Daelemans, W., & Canisius, S.
    (2007) An efficient memory-based morphosyntactic tagger and parser for Dutch. InF. van Eynde, P. Dirix, I. Schuurman, & V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting (CLIN-17, Leuven), (pp.99–114). Utrecht: LOT. Retrieved fromhttps://ilk.uvt.nl/downloads/pub/papers/tadpole-final.pdf (last accessedSeptember 2018).
    [Google Scholar]
  17. Van Gompel, M.
    (2014) FoLiA: Format for linguistic annotation. Documentation, Technical Report Language and Speech Technology Technical Report Series LST-14-01, Radboud University Nijmegen.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error