1887
Volume 175, Issue 1
  • ISSN 0019-0829
  • E-ISSN: 1783-1490
USD
Buy:$35.00 + Taxes

Abstract

Abstract

We describe a large-scale effort to map English-language vocabulary by U.S. school grade levels. Our motivation is to rapidly expand graded vocabulary resources for work with native English speakers in the USA, while taking into consideration school-related influences rather than relying on just the corpus-frequency approaches. We report on the initial effort of data collection, with mapping of about 22K word forms. We provide comparisons of this mapping to some other recent vocabulary mapping efforts, such as age-of-acquisition. We then describe the efforts to automatically expand this resource by using linguistically motivated variables and corpus-based methods. Our current resource maps more than 126K English word forms to US school grade levels. We also compare a subset of our L1 mapped data to English L2 vocabulary levels, as expressed on the CEFR scale, and find that there is a considerable overlap in the order of vocabulary learning in L1 and L2 English.

Loading

Article metrics loading...

/content/journals/10.1075/itl.22025.flo
2024-02-26
2024-12-14
Loading full text...

Full text loading...

References

  1. Alfter, D., & Volodina, E.
    (2018) Towards single word lexical complexity prediction. InProceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages79–88. New Orleans, Louisiana, June 5, 2018. https://aclanthology.org/W18-0508. 10.18653/v1/W18‑0508
    https://doi.org/10.18653/v1/W18-0508 [Google Scholar]
  2. Biemiller, A.
    (2010) Words worth teaching: Closing the vocabulary gap. Columbus: McGraw-Hill.
    [Google Scholar]
  3. Biemiller, A., & Slonim, N.
    (2001) Estimating root word vocabulary growth in normative and advantaged populations: Evidence for a common sequence of vocabulary acquisition. Journal of Educational Psychology, 931, 498–520. 10.1037/0022‑0663.93.3.498
    https://doi.org/10.1037/0022-0663.93.3.498 [Google Scholar]
  4. Botarleanu, R. M., Dascalu, M., Watanabe, M., Crossley, S. A., McNamara, D. S.
    (2022) Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings. Behavior Research Methods, (541), 3015–3042. 10.3758/s13428‑022‑01797‑5
    https://doi.org/10.3758/s13428-022-01797-5 [Google Scholar]
  5. Breland, H.
    (1996) Word frequency and word difficulty: a comparison of counts in four corpora. Psychological Science, 7:2, 96–99. 10.1111/j.1467‑9280.1996.tb00336.x
    https://doi.org/10.1111/j.1467-9280.1996.tb00336.x [Google Scholar]
  6. Brysbaert, M., & Biemiller, A.
    (2017) Test based age of acquisition norms for 44 thousand English word meanings. Behavior Research Methods, 491, 1520–1523. 10.3758/s13428‑016‑0811‑4
    https://doi.org/10.3758/s13428-016-0811-4 [Google Scholar]
  7. Brysbaert, M., Keuleers, E., & Mandera, P.
    (2021) Which words do English non-native speakers know? New supernational levels based on yes/no decision. Second Language Research, 37(2), 207–231. 10.1177/0267658320934526
    https://doi.org/10.1177/0267658320934526 [Google Scholar]
  8. Brysbaert, M., Mandera, P., McCormick, S. F., & Keuleers, E.
    (2019) Word prevalence norms for 62,000 English lemmas. Behavior Research Methods, 511, 467–479. 10.3758/s13428‑018‑1077‑9
    https://doi.org/10.3758/s13428-018-1077-9 [Google Scholar]
  9. Brysbaert, M., & New, B.
    (2009) Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 411, 977–990. 10.3758/BRM.41.4.977
    https://doi.org/10.3758/BRM.41.4.977 [Google Scholar]
  10. Brysbaert, M., Stevens, M., Mandera, M., & Keuleers, E.
    (2016) How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age. Frontiers in Psychology, 71:1116. 10.3389/fpsyg.2016.01116
    https://doi.org/10.3389/fpsyg.2016.01116 [Google Scholar]
  11. Capel, A.
    (2012) Completing the English Vocabulary Profile: C1 and C2 vocabulary. English Profile Journal, 3(1), 1–14. 10.1017/S2041536212000013
    https://doi.org/10.1017/S2041536212000013 [Google Scholar]
  12. Carroll, J. B., Davies, P., & Richman, B.
    (1971) The American Heritage word frequency book. New York; American Heritage Publishing Co.
    [Google Scholar]
  13. Carroll, J. B., & White, M. N.
    (1973) Age of acquisition norms for 220 picturable nouns. Journal of Verbal Learning & Verbal Behavior, 121, 563–576. 10.1016/S0022‑5371(73)80036‑2
    https://doi.org/10.1016/S0022-5371(73)80036-2 [Google Scholar]
  14. Chenu, F., & Jisa, H.
    (2009) Reviewing some similarities and differences in L1 and L2 lexical development. Acquisition et interaction en langue étrangère, 11, 17–38. 10.4000/aile.4506
    https://doi.org/10.4000/aile.4506 [Google Scholar]
  15. Chujo, K., & Oghigian, K.
    (2015) Examining Corpus-based L 2 Vocabulary Lists for Grade Level and Semantic Field Distribution. Journal of the College of Industrial Technology, Nihon University, Vol.481, pp.11–19. hanamizuki2010.sakura.ne.jp/public_html/data/b48.2Examining%20v%20list.pdf
    [Google Scholar]
  16. Council of Europe
    Council of Europe (2001) Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.
    [Google Scholar]
  17. Dale, E., & Chall, J.
    (1948) A Formula for Predicting Readability. Educational Research Bulletin, 271, 11–20
    [Google Scholar]
  18. Dale, E., & O’Rourke, J.
    (1981) The living word vocabulary, the words we know: A national vocabulary inventory. Chicago: World Book.
    [Google Scholar]
  19. Dang, T. N. Y.
    (2020) Corpus-based word lists in second language vocabulary research, learning, and teaching. InS. Webb (ed.), The Routledge Handbook of Vocabulary Studies, pp.288–304. New York: Routledge.
    [Google Scholar]
  20. Dolch, E. W.
    (1936) A Basic Sight Vocabulary. The Elementary School Journal, 36:6, 456–460. 10.1086/457353
    https://doi.org/10.1086/457353 [Google Scholar]
  21. Dürlich, L., & François, T.
    (2018) EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language. InProceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), pages873–879.
    [Google Scholar]
  22. Ellis, N. C., Simpson-Vlach, R., & Maynard, C.
    (2008) Formulaic Language in Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics, and TESOL. TESOL Quarterly, 42:3, 375–396. 10.1002/j.1545‑7249.2008.tb00137.x
    https://doi.org/10.1002/j.1545-7249.2008.tb00137.x [Google Scholar]
  23. Firth, J. R.
    (1957) A Synopsis of Linguistic Theory, 1930–55. InJ. R. Firth , Studies in Linguistic Analysis, pp.1–31. Special Volume of the Philological Society. Oxford: Blackwell.
    [Google Scholar]
  24. Flor, M., & Beigman Klebanov, B.
    (2014) ETS Lexical Associations System for the COGALEX 4 Shared Task. InM. Zock, R. Rapp, Ch. R. Huang (eds.), Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon, pages35–45; At COLING 2014 conference, Dublin, Ireland. 10.3115/v1/W14‑4705
    https://doi.org/10.3115/v1/W14-4705 [Google Scholar]
  25. Gala, N., François, T., & Fairon, C.
    (2013) Towards a French lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. eLex – Electronic Lexicography, October 2013, Tallin, Estonia. 10.13140/2.1.3913.4089
    https://doi.org/10.13140/2.1.3913.4089 [Google Scholar]
  26. Gilhooly, K., & Logie, R. H.
    (1980) Age of acquisition, imagery, concreteness, familiarity and ambiguity measures for 1944 words. Behavior Research Methods & Instrumentation, 121, 395–427. 10.3758/BF03201693
    https://doi.org/10.3758/BF03201693 [Google Scholar]
  27. Graën, J., Alfter, D., & Schneider, G.
    (2020) Using Multilingual Resources to Evaluate CEFRLex for Learner Applications. Inthe Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages346–355.
    [Google Scholar]
  28. Gries, S. Th.
    (2008) Dispersion and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13:4, 403–437. 10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  29. Harris, A. J.
    (1972) Rationale and Description of Basic Elementary Reading Vocabularies. Paper presented at themeeting of the International Reading Association, Detroit, Michigan, May, 1972. https://eric.ed.gov/?id=ED062091
    [Google Scholar]
  30. Harris, A. J., & Jacobson, M. D.
    (1972) Basic Elementary Reading Vocabularies. New York: The Macmillan Co.
    [Google Scholar]
  31. Harris, Z.
    (1954) Distributional structure. Word, 10(23), pp.146–162. 10.1080/00437956.1954.11659520
    https://doi.org/10.1080/00437956.1954.11659520 [Google Scholar]
  32. Hiebert, E. H.
    (2020) The Core Vocabulary: The Foundation of Proficient Comprehension. The Reading Teacher, 73:6, pp.757–768. 10.1002/trtr.1894
    https://doi.org/10.1002/trtr.1894 [Google Scholar]
  33. Hiebert, E. H., Scott, J. A., Castaneda, R., & Spichtig, A.
    (2019) An Analysis of the Features of Words That Influence Vocabulary Difficulty. Education Sciences, 9(1), 8. 10.3390/educsci9010008
    https://doi.org/10.3390/educsci9010008 [Google Scholar]
  34. Ivens, S. H., & Koslin, B. L.
    (1991) Demands for Reading Literacy Require New Accountability Methods. Touchstone Applied Science Associates.
    [Google Scholar]
  35. Kireyev, K., & Landauer, T. K.
    (2011) Word Maturity: Computational Modeling of Word Knowledge. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages299–308, Portland, Oregon.
    [Google Scholar]
  36. Kučera, H., & Francis, W. N.
    (1967) Computational analysis of present-day American English, Providence, RI: Brown University Press.
    [Google Scholar]
  37. Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M.
    (2012) Age of acquisition ratings for 30,000 English words. Behavior Research Methods, 441, 978–990. 10.3758/s13428‑012‑0210‑4
    https://doi.org/10.3758/s13428-012-0210-4 [Google Scholar]
  38. Kyle, K., & Crossley, S. A.
    (2015) Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application. TESOL Quarterly, 49:4, 757–786. 10.1002/tesq.194
    https://doi.org/10.1002/tesq.194 [Google Scholar]
  39. Laufer, B., & Nation, I. S. P.
    (1995) Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics, 16:3, 307–322. 10.1093/applin/16.3.307
    https://doi.org/10.1093/applin/16.3.307 [Google Scholar]
  40. Lété, B., Sprenger-Charolles, L., & Colé, P.
    (2004) MANULEX: A grade-level lexical database from French elementary school readers. Behavior Research Methods, Instruments, & Computers, 361, 156–166. 10.3758/BF03195560
    https://doi.org/10.3758/BF03195560 [Google Scholar]
  41. Longman Dictionary of Contemporary English
    Longman Dictionary of Contemporary English (1988) Longman Publishing Group
  42. Mesmer, H. A., Cunningham, J. W., & Hiebert, E. H.
    (2012) Toward a theoretical model of text complexity for the early grades: Learning from the past, anticipating the future. Reading Research Quarterly, 47(3), 235–258. 10.1002/rrq.019
    https://doi.org/10.1002/rrq.019 [Google Scholar]
  43. Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Brockman, W., The Google Books Team, Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., & Lieberman Aiden, E.
    (2011) Quantitative analysis of culture using millions of digitized books. Science, 3311, 176–182. 10.1126/science.1199644
    https://doi.org/10.1126/science.1199644 [Google Scholar]
  44. Miralpeix, I.
    (2020) L1 and L2 Vocabulary Size and Growth. InS. Webb (ed.), The Routledge Handbook of Vocabulary Studies, pp.204–06. New York: Routledge.
    [Google Scholar]
  45. Mobärg, M.
    (1997) Acquiring, teaching and testing vocabulary. International Journal of Applied Linguistics, 7:2, 201–222. 10.1111/j.1473‑4192.1997.tb00115.x
    https://doi.org/10.1111/j.1473-4192.1997.tb00115.x [Google Scholar]
  46. Nation, I. S. P.
    (2020) The Different Aspects of Vocabulary Knowledge. InS. Webb (ed.), The Routledge Handbook of Vocabulary Studies, pp.15–29. New York: Routledge.
    [Google Scholar]
  47. (2017) The BNC/COCA Level 6 word family lists (Version 1.0.0) [Data file]. Available fromwww.victoria.ac.nz/lals/staff/paul-nation.aspx
  48. (2004) A study of the most frequent word families in the British National Corpus. InP. Bogaards & B. Laufer (eds.) Vocabulary in a Second Language: Selection, Acquisition and TestingAmsterdam: John Benjamins: 3–13. 10.1075/lllt.10.03nat
    https://doi.org/10.1075/lllt.10.03nat [Google Scholar]
  49. Nation, I. S. P., & Waring, R.
    (1997) Vocabulary size, text coverage, and word lists. InN. Schmitt & M. McCarthy (eds.), Vocabulary: Description, Acquisition and Pedagogy. Cambridge University Press, Cambridge: 6–19.
    [Google Scholar]
  50. The Oxford 3000 from the Oxford Advanced American Dictionary
    The Oxford 3000 from the Oxford Advanced American Dictionary (2019) Oxford University Press. Online resource: https://www.oxfordlearnersdictionaries.com/us/wordlist/american_english/oxford3000/
  51. Parker, R., Graff, D., Kong, J., Chen, K., & Maeda, K.
    (2009) Gigaword Fourth Edition. LDC2009T13. Philadelphia: Linguistic Data Consortium. 10.35111/y4px‑6y07
    https://doi.org/10.35111/y4px-6y07 [Google Scholar]
  52. Pelánek, R., Effenberger, T., & Čechák, J.
    (2022) Complexity and Difficulty of Items in Learning Systems. International Journal of Artificial Intelligence in Education, 321, 196–232. 10.1007/s40593‑021‑00252‑4
    https://doi.org/10.1007/s40593-021-00252-4 [Google Scholar]
  53. Shardlow, M., Evans, R., Paetzold, G. H., & Zampieri, M.
    (2021) SemEval 2021 Task 1: Lexical Complexity Prediction. InProceedings of the 15th International Workshop on Semantic Evaluation (SemEval 2021), pages1–16. 10.18653/v1/2021.semeval‑1.1
    https://doi.org/10.18653/v1/2021.semeval-1.1 [Google Scholar]
  54. Soares, A. P., Medeiros, J. P., Simões, A., Machado, J., Costa, A., Iriarte, A., João de Almeida, J., Pinheiro, A. P., & Comesaña, M.
    (2014) ESCOLEX: A grade-level lexical database from European Portuguese elementary to middle school textbooks. Behavior Research Methods, 461, 240–253. 10.3758/s13428‑013‑0350‑1
    https://doi.org/10.3758/s13428-013-0350-1 [Google Scholar]
  55. Taylor, S. E., Frackenpohl, H., & White, C. E.
    (1989) EDL Core Vocabularies in Reading, Mathematics, Science, and Social Studies. Steck Vaughn Company, Austin, Texas.
    [Google Scholar]
  56. Thorndike, E. L., & Lorge, I.
    (1944) The teacher’s word book of 30,000 words. New York: Bureau of Publications, Teachers College, Columbia University.
    [Google Scholar]
  57. Uemura, T., & Ishikawa, Sh.
    (2004) JACET 8000 and Asia TEFL Vocabulary Initiative. The journal of Asia TEFL, vol.1, No.1, pp.333–347.
    [Google Scholar]
  58. Vilkaitė-Lozdienė, L., & Schmitt, N.
    (2020) Frequency as a Guide for Vocabulary Usefulness. InS. Webb (ed.), The Routledge Handbook of Vocabulary Studies, pp.81–96. New York: Routledge.
    [Google Scholar]
  59. Yimam, S. E., Biemann, C., Malmasi, Sh., Paetzold, G. H., Specia, L., Stajner, S., Tack, A., & Zampieri, M.
    (2018) A Report on the Complex Word Identification Shared Task 2018. InProceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages66–78. 10.18653/v1/W18‑0507
    https://doi.org/10.18653/v1/W18-0507 [Google Scholar]
  60. Zeno, S., Ivens, S. H., Millard, R. T., & Duvvuri, R.
    (1995) The educator’s word frequency guide. Brewster: Touchstone Applied Science.
    [Google Scholar]
/content/journals/10.1075/itl.22025.flo
Loading
/content/journals/10.1075/itl.22025.flo
Loading

Data & Media loading...

  • Article Type: Research Article
Keyword(s): grade levels; graded resources; lexical progression; vocabulary; word difficulty
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error