1887
image of A multi-dimensional comparison of the effectiveness and efficiency of association measures in collocation
extraction
USD
Buy:$35.00 + Taxes

Abstract

Abstract

Because of the ubiquity and importance of collocations in language use/learning, how to effectively and efficiently identify collocations has been a topic of interest. Although some studies have evaluated many of the existing association measures (AMs) used in the automatic identification of collocations, the results so far have been inconsistent and unclear due to various limitations of the existing studies. Hence, this study makes a multi-dimensional evaluation of the effectiveness and efficiency of seven major AMs in the identification of three types of collocations across five genres and seven corpora of different sizes. The results indicate that while a few AMs, such as Log Likelihood Ratio and Cubic Mutual Information (MI), are consistently more effective and efficient than the other five AMs being examined, no one AM alone may be adequate in the identification of different types of collocations across different genres and corpus sizes. Research implications are also discussed.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.19111.den
2022-05-10
2022-05-26
Loading full text...

Full text loading...

References

  1. Auksoriūtė, A.
    (2008) Eurotermbank–Term Bank of the New Eu Members. Coactivity: Philology, Educology, 16(2), 12–19.
    [Google Scholar]
  2. Barfield, A., & Gyllstad, H.
    (2009) Introduction: Researching L2 collocation knowledge and development. InA. Barfield & H. Gyllstad (Eds.), Researching Collocations in Another Language (pp. 1–20). Palgrave Macmillan. 10.1057/9780230245327_1
    https://doi.org/10.1057/9780230245327_1 [Google Scholar]
  3. Bartsch, S., & Evert, S.
    (2014) Towards a Firthian notion of collocation. InA. Abel & L. Lemnitzer (Eds.), Vernetzungsstrategien, Zugriffsstrukturen und automatisch ermittelte Angaben in Internetwörterbüchern [Networking Strategies, Access Structures and Automatically Retrieved Information in Internet Dictionaries] (pp. 48–61). Institut für Deutsche Sprache.
    [Google Scholar]
  4. Benson, M., Benson, E., & Ilson, R.
    (2010) The BBI Combinatory Dictionary of English: Your Guide to Collocations and Grammar (3rd ed.). John Benjamins. 10.1075/z.bbi
    https://doi.org/10.1075/z.bbi [Google Scholar]
  5. Bestgen, Y., & Granger, S.
    (2014) Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  6. Bisht, R. K., Dhami, H. S., & Tiwari, N.
    (2006) An evaluation of different statistical techniques of collocation extraction using a probability measure to word combinations. Journal of Quantitative Linguistics, 13(2–3), 161–175. 10.1080/09296170600850064
    https://doi.org/10.1080/09296170600850064 [Google Scholar]
  7. BNC Consortium
    BNC Consortium (2007) British National Corpus (version 3, BNC XML ed.). www.natcorp.ox.ac.uk
    [Google Scholar]
  8. Choueka, Y., Klein, T., & Nuwitz, E.
    (1983) Automatic retrieval of frequent idiomatic and collocational expressions in a large corpus. Journal for Literary and Linguistic Computing, 4(1), 34–38.
    [Google Scholar]
  9. Church, K. W., & Hanks, P.
    (1990) Word association, norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.
    [Google Scholar]
  10. Church, K. W., Gale, W., Hanks, P., Hindle, R., & Moon, R.
    (1994) Lexical substitutability. InB. T. S. Atkins & A. Zampolli (Eds.), Computational Approaches to the Lexicon (pp. 153–177). Oxford University Press.
    [Google Scholar]
  11. Crossley, S., Salsbury, T., & McNamara, D.
    (2015) Assessing lexical proficiency using analytic ratings: A case for collocation accuracy. Applied Linguistics, 36(5), 570–590.
    [Google Scholar]
  12. Daille, B.
    (1994) Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques [Mixed Approach for the Automatic Extraction of Terminology: Lexical Statistics and Linguistic Filters] [Unpublished doctoral dissertation]. Universite’ Paris 7. www.theses.fr/1994PA077353
    [Google Scholar]
  13. Daille, B., Gaussier, E., & Langé, J. M.
    (1998) An evaluation of statistical scores for word association. InJ. Ginzburg, Z. Khasidashvili, C. Vogel, J.-J. Levy, & E. Vallduvi (Eds.), The Tbilisi Symposium on Logic, Language and Computation: Selected Papers (pp. 177–188). CSLI.
    [Google Scholar]
  14. Daudaravičius, V., & Marcinkevičienė, R.
    (2004) Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics, 9(2), 321–348. 10.1075/ijcl.9.2.08dau
    https://doi.org/10.1075/ijcl.9.2.08dau [Google Scholar]
  15. Davies, M.
    (2008–) The Corpus of Contemporary American English (COCA): 560 million words, 1990-present. Available online athttps://www.english-corpora.org/coca/
    [Google Scholar]
  16. Dunning, T.
    (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
    [Google Scholar]
  17. Durrant, P., & Schmitt, N.
    (2009) To what extent do native and non-native writers make use of collocations?IRAL-International Review of Applied Linguistics in Language Teaching, 47(2), 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  18. Erman, B., Forsberg Lundell, F., & Lewis, M.
    (2016) Formulaic language in advanced second language acquisition and use. InK. Hyltenstam (Ed.), Advanced Proficiency and Exceptional Ability in Second Languages (pp. 111–147). Walter de Gruyter. 10.1515/9781614515173‑007
    https://doi.org/10.1515/9781614515173-007 [Google Scholar]
  19. Evert, P.
    (2005) The Statistics of Word Co-occurrences: Word Pairs and Collocations [Doctoral dissertation, Universität Stuttgart]. OPUS. https://elib.uni-stuttgart.de/bitstream/11682/2573/1/Evert2005phd.pdf
    [Google Scholar]
  20. Evert, S.
    (2009) Corpora and collocations. InM. Kytö & A. Lüdeling (Eds.), Corpus Linguistics: An International Handbook (Vol.2, pp. 1212–1248). Mouton de Gruyter.
    [Google Scholar]
  21. Evert, S., & Krenn, B.
    (2001) Methods for qualitative evaluation of lexical association measures. InProceedings of the 39th Annual Meeting of the Association of Computational Linguistics (pp. 188–195). Association of Computational Linguistics. https://aclanthology.org/P01-1025/. 10.3115/1073012.1073037
    https://doi.org/10.3115/1073012.1073037 [Google Scholar]
  22. Fernández, B. G., & Schmitt, N.
    (2015) How much collocation knowledge do L2 learners have?ITL-International Journal of Applied Linguistics, 166(1), 94–126. 10.1075/itl.166.1.03fer
    https://doi.org/10.1075/itl.166.1.03fer [Google Scholar]
  23. Gablasova, D., Brezina, V., & McEnery, T.
    (2017) Collocations in corpus-based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning, 67(S1),155–179. 10.1111/lang.12225
    https://doi.org/10.1111/lang.12225 [Google Scholar]
  24. Hanks, P.
    (1996) Contextual dependency and lexical sets. International Journal of Corpus Linguistics, 1(1), 75–98. 10.1075/ijcl.1.1.06han
    https://doi.org/10.1075/ijcl.1.1.06han [Google Scholar]
  25. HarperCollins
    HarperCollins (1991) Bank of English.
    [Google Scholar]
  26. Heinle ELT
    Heinle ELT (2008) Collins Cobuild Advanced Dictionary (6th ed.).
    [Google Scholar]
  27. Hill, J., & Lewis, M.
    (1997) LTP Dictionary of Selected Collocations. Language Teaching.
    [Google Scholar]
  28. Hoffman, S., Evert, S., Smith, N., Lee, D., & Berglund Prytz, Y.
    (2008) Corpus Linguistics with BNCweb: A Practical Guide. Peter Lang.
    [Google Scholar]
  29. Hughes, J., & Hardie, A.
    (2019) Corpus linguistics and event-related potentials. InJ. Egbert & J. Baker (Eds.), Using Corpus Methods to Triangulate Linguistic Analysis (pp. 185–218). Routledge. 10.4324/9781315112466‑8
    https://doi.org/10.4324/9781315112466-8 [Google Scholar]
  30. Hunston, S.
    (2002) Corpora in Applied Linguistics. Cambridge University Press. 10.1017/CBO9781139524773
    https://doi.org/10.1017/CBO9781139524773 [Google Scholar]
  31. Kilgarriff, A., Rychlý, P., Smrz, P., & Tugwell, D.
    (2004) The Sketch Engine. InG. Williams & S. Vessier (Eds.), Proceedings of the 11th EURALEX International Congress (pp. 105–116). Université de Bretagne Sud.
    [Google Scholar]
  32. Krenn, B., & Evert, S.
    (2001) Can we do better than frequency? A case study on extracting PP-verb collocations. InProceedings of the ACL Workshop on Collocations (pp. 39–46). Association for Computational Linguistics.
    [Google Scholar]
  33. Kumova Metin, S., & Karaoğlan, B.
    (2010) Collocation extraction in Turkish texts using statistical methods. InE. Rognvaldsson & H. Loftsson (Eds.), Advances in Natural Language Processing: 7th International Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 16–18, 2010: Proceedings (pp. 238–249). Springer. 10.1007/978‑3‑642‑14770‑8_27
    https://doi.org/10.1007/978-3-642-14770-8_27 [Google Scholar]
  34. (2011) Measuring collocation tendency of words. Journal of Quantitative Linguistics, 18(2), 174–187. 10.1080/09296174.2011.556005
    https://doi.org/10.1080/09296174.2011.556005 [Google Scholar]
  35. Lei, L., & Liu, D.
    (2018) The academic English collocation list: A corpus-driven study. International Journal of Corpus Linguistics, 23(2), 216–243. 10.1075/ijcl.16135.lei
    https://doi.org/10.1075/ijcl.16135.lei [Google Scholar]
  36. Liu, D.
    (2010a) Is it a chief, main, major, primary, or principal concern? A corpus-based behavioral profile study of the near-synonyms and its implications. International Journal of Corpus Linguistics, 15(1), 56–87. 10.1075/ijcl.15.1.03liu
    https://doi.org/10.1075/ijcl.15.1.03liu [Google Scholar]
  37. (2010b) Going beyond patterns: Involving cognitive analysis in the learning of collocations. TESOL Quarterly, 44(1), 4–30. 10.5054/tq.2010.214046
    https://doi.org/10.5054/tq.2010.214046 [Google Scholar]
  38. (2013) Salience and construal in the use of synonymy: A study of two sets of near-synonymous nouns. Cognitive Linguistics, 24(1), 67–113. 10.1515/cog‑2013‑0003
    https://doi.org/10.1515/cog-2013-0003 [Google Scholar]
  39. Macmillan
    Macmillan (2012) Macmillan English Dictionary for Advanced Learners.
    [Google Scholar]
  40. Manning, C. D., & Schütze, H.
    (2000) Foundations of Statistical Natural Language Processing. MIT Press.
    [Google Scholar]
  41. Nesselhauf, N.
    (2005) Collocations in a Learner Corpus. John Benjamins. 10.1075/scl.14
    https://doi.org/10.1075/scl.14 [Google Scholar]
  42. Oxford University Press
    Oxford University Press (2002) Oxford Collocations Dictionary for Students of English.
    [Google Scholar]
  43. Oxford University Press
    Oxford University Press. (n.d). Oxford English Corpus.
    [Google Scholar]
  44. Pearson Longman
    Pearson Longman (2009) Longman Dictionary of Contemporary English.
    [Google Scholar]
  45. Pecina, P.
    (2005) An extensive empirical study of collocation extraction methods. InC. Callison-Burch & S. Wan (Eds.), Proceedings of the ACL Student Research Workshop (pp. 13–18). Association for Computational Linguistics. https://aclanthology.org/P05-2003/. 10.3115/1628960.1628964
    https://doi.org/10.3115/1628960.1628964 [Google Scholar]
  46. (2010) Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1–2),137–158. 10.1007/s10579‑009‑9101‑4
    https://doi.org/10.1007/s10579-009-9101-4 [Google Scholar]
  47. Pecina, P., & Schlesinger, P.
    (2006) Combining association measures for collocation extraction. InProceedings of the 21th International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006, pp. 651–658). Association for Computational Linguistics. https://aclanthology.org/P06-2084/. 10.3115/1273073.1273157
    https://doi.org/10.3115/1273073.1273157 [Google Scholar]
  48. R Core Team
    R Core Team (2019) R: A language and environment for statistical computing (Version 3.6.0) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
    [Google Scholar]
  49. Rychlý, P.
    (2008) A lexicographer-friendly association score. InP. Sojka & A. Horák (Eds.), Proceedings of Recent Advances in Slavonic Natural Language Processing (pp. 6–9). Masaryk University. https://nlp.fi.muni.cz/raslan/2008/papers/13.pdf
    [Google Scholar]
  50. Scott, S., & Matwin, S.
    (1999) Feature engineering for text classification. InI. Bratko & S. Dzeroski (Eds.), Proceedings of the Sixteenth International Conference on Machine Learning (pp. 379–388). Morgan Kaufmann.
    [Google Scholar]
  51. Simpson-Vlach, R., & Ellis, N. C.
    (2010) An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487–512. 10.1093/applin/amp058
    https://doi.org/10.1093/applin/amp058 [Google Scholar]
  52. Sinclair, J. M.
    (1991) Corpus, Concordance, Collocation. Oxford University Press.
    [Google Scholar]
  53. Smadja, F., & McKeown, K.
    (1991) Using collocations for language generation. Computational Intelligence, 7(4), 229–239. 10.1111/j.1467‑8640.1991.tb00397.x
    https://doi.org/10.1111/j.1467-8640.1991.tb00397.x [Google Scholar]
  54. Thanopoulos, A., Fakotakis, N., & Kokkinakis, G.
    (2002) Comparative evaluation of collocation extraction metrics. InM. González R. & C. Paz Suarez Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 620–625). ELRA.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/ijcl.19111.den
Loading
/content/journals/10.1075/ijcl.19111.den
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error