Volume 27, Issue 3
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



The aim of collostructional analysis or, more precisely, simple collexeme analysis, is to quantify the statistical association between a construction and a lexeme that occurs in a particular slot of the construction. The analysis is based on contingency tables that ought to represent a cross-classification of the units of analysis. So far, the units of analysis have been identified either as all constructions in the corpus or all instances of a class of constructions to which construction belongs. In practice, it is often not possible or feasible to identify these constructions. Therefore, the sample size is typically approximated by heuristic estimates. The bottom-right cell of the contingency table is most affected by these approximations. I suggest that the units of analysis be defined on the word level, instead, as the class of word forms that satisfy the restrictions on the collexeme slot of .


Article metrics loading...

Loading full text...

Full text loading...


  1. Bybee, J.
    (2010) Language, Usage and Cognition. Cambridge University Press. 10.1017/CBO9780511750526
    https://doi.org/10.1017/CBO9780511750526 [Google Scholar]
  2. Chen, D., & Manning, C. D.
    (2014) A fast and accurate dependency parser using neural networks. InA. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) (pp.740–750). 10.3115/v1/D14‑1082
    https://doi.org/10.3115/v1/D14-1082 [Google Scholar]
  3. Church, K. W.
    (2000) Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p2. InProceedings of the 18th Conference on Computational Linguistics (COLING’00), Volume 1 (pp.180–186). aclweb.org/anthology/C00-1027. 10.3115/990820.990847
    https://doi.org/10.3115/990820.990847 [Google Scholar]
  4. Church, K. W., & Hanks, P.
    (1990) Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.
    [Google Scholar]
  5. Church, K., Gale, W., Hanks, P., & Hindle, D.
    (1989) Parsing, word associations and typical predicate-argument relations. InSpeech and Natural Language: Proceedings of a Workshop held at Cape Cod, Massachusetts, October 15–18, 1989 (pp.75–81). aclweb.org/anthology/H89-2012. 10.3115/1075434.1075449
    https://doi.org/10.3115/1075434.1075449 [Google Scholar]
  6. Evert, S.
    (2004) The Statistics of Word Cooccurrences: Word Pairs and Collocations [Doctoral dissertation, Universität Stuttgart]. elib.uni-stuttgart.de/opus/volltexte/2005/2371/
    [Google Scholar]
  7. Goldberg, A. E.
    (2006) Constructions at Work: The Nature of Generalization in Language. Oxford University Press.
    [Google Scholar]
  8. Gries, S. T.
    (2012) Frequencies, probabilities, and association measures in usage-/exemplar-based linguistics: Some necessary clarifications. Studies in Language, 36(3), 477–510. 10.1075/sl.36.3.02gri
    https://doi.org/10.1075/sl.36.3.02gri [Google Scholar]
  9. (2015) More (old and new) misunderstandings of collostructional analysis: On Schmid and Küchenhoff (2013). Cognitive Linguistics, 26(3), 505–536. 10.1515/cog‑2014‑0092
    https://doi.org/10.1515/cog-2014-0092 [Google Scholar]
  10. Gries, S. T., & Stefanowitsch, A.
    (2004a) Covarying collexemes in the into-causative. InM. Achard & S. Kemmer (Eds.), Language, Culture, and Mind (pp.225–236). CSLI.
    [Google Scholar]
  11. (2004b) Extending collostructional analysis: A corpus-based perspective on “alternations”. International Journal of Corpus Linguistics, 9(1), 97–129. 10.1075/ijcl.9.1.06gri
    https://doi.org/10.1075/ijcl.9.1.06gri [Google Scholar]
  12. Jones, S., & Sinclair, J.
    (1974) English lexical collocations. Cahiers de Lexicologie, 24, 15–61.
    [Google Scholar]
  13. Katz, S. M.
    (1996) Distribution of content words and phrases in text and language modelling. Natural Language Engineering, 2(1), 15–59. 10.1017/S1351324996001246
    https://doi.org/10.1017/S1351324996001246 [Google Scholar]
  14. Kilgarriff, A.
    (2005) Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory, 1(2), 263–276. 10.1515/cllt.2005.1.2.263
    https://doi.org/10.1515/cllt.2005.1.2.263 [Google Scholar]
  15. Korhonen, A.
    (2002) Subcategorization Acquisition [Doctoral dissertation, University of Cambridge]. https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-530.pdf
    [Google Scholar]
  16. Korhonen, A., Krymolowski, Y., & Briscoe, T.
    (2006) A large subcategorization lexicon for natural language processing applications. InProceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (pp.1015–1020). lrec-conf.org/proceedings/lrec2006/pdf/558_pdf.pdf
    [Google Scholar]
  17. Küchenhoff, H., & Schmid, H.-J.
    (2015) Reply to “More (old and new) misunderstandings of collostructional analysis: On Schmid & Küchenhoff” by Stefan Th. Gries. Cognitive Linguistics, 26(3), 537–547. 10.1515/cog‑2015‑0053
    https://doi.org/10.1515/cog-2015-0053 [Google Scholar]
  18. Loftus, G. R.
    (1996) Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5(6), 161–171. 10.1111/1467‑8721.ep11512376
    https://doi.org/10.1111/1467-8721.ep11512376 [Google Scholar]
  19. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D.
    (2014) The Stanford CoreNLP natural language processing toolkit. InProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, System Demonstrations (pp.55–60). 10.3115/v1/P14‑5010
    https://doi.org/10.3115/v1/P14-5010 [Google Scholar]
  20. Nivre, J., Marneffe, M.-C. de, Ginter, F., Goldberg, Y., Hajič, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D.
    (2016) Universal Dependencies v1: A multilingual treebank collection. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp.1659–1666). www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf
    [Google Scholar]
  21. Pecina, P.
    (2005) An extensive empirical study of collocation extraction methods. InC. Callison-Burch & S. Wan (Eds.), Proceedings of the ACL Student Research Workshop (pp.13–18). aclweb.org/anthology/P05-2003. 10.3115/1628960.1628964
    https://doi.org/10.3115/1628960.1628964 [Google Scholar]
  22. (2010) Lexical association measures and collocation extraction. Language Resources and Evaluation, 44(1), 137–158. 10.1007/s10579‑009‑9101‑4
    https://doi.org/10.1007/s10579-009-9101-4 [Google Scholar]
  23. Sarkar, A., & Zeman, D.
    (2000) Automatic extraction of subcategorization frames for Czech. InProceedings of the 18th International Conference on Computational Linguistics (COLING’00), Volume 2 (pp.691–697). aclweb.org/anthology/C00-2100. 10.3115/992730.992746
    https://doi.org/10.3115/992730.992746 [Google Scholar]
  24. Schäfer, R.
    (2015) Processing and querying large web corpora with the COW14 architecture. InP. Bański, H. Biber, E. Breiteneder, M. Kupietz, H. Lüngen, & A. Witt (Eds.), Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3) (pp.28–34). https://ids-pub.bsz-bw.de/files/3826/Schaefer_Processing_and_querying_large_web_corpora_2015.pdf
    [Google Scholar]
  25. Schäfer, R., & Bildhauer, F.
    (2012) Building large corpora from the web using a new efficient tool chain. InProceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp.486–493). www.lrec-conf.org/proceedings/lrec2012/summaries/834.html
    [Google Scholar]
  26. Schmid, H.-J.
    (2000) English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Mouton de Gruyter. 10.1515/9783110808704
    https://doi.org/10.1515/9783110808704 [Google Scholar]
  27. Schmid, H.-J., & Küchenhoff, H.
    (2013) Collostructional analysis and other ways of measuring lexicogrammatical attraction: Theoretical premises, practical problems and cognitive underpinnings. Cognitive Linguistics, 24(3), 531–577. 10.1515/cog‑2013‑0018
    https://doi.org/10.1515/cog-2013-0018 [Google Scholar]
  28. Schuster, S., & Manning, C. D.
    (2016) Enhanced English universal dependencies: An improved representation for natural language understanding tasks. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp.2371–2378). www.lrec-conf.org/proceedings/lrec2016/pdf/779_Paper.pdf
    [Google Scholar]
  29. Stefanowitsch, A.
    (2014) Collostructional analysis: A case study of the English into-causative. InT. Herbst, H.-J. Schmid, & S. Faulhaber (Eds.), Constructions Collocations Patterns (pp.217–238). De Gruyter Mouton. 10.1515/9783110356854.217
    https://doi.org/10.1515/9783110356854.217 [Google Scholar]
  30. Stefanowitsch, A., & Gries, S. T.
    (2003) Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243. 10.1075/ijcl.8.2.03ste
    https://doi.org/10.1075/ijcl.8.2.03ste [Google Scholar]
  31. (2005) Covarying collexemes. Corpus Linguistics and Linguistic Theory, 1(1), 1–43. 10.1515/cllt.2005.1.1.1
    https://doi.org/10.1515/cllt.2005.1.1.1 [Google Scholar]
  32. (2009) Corpora and grammar. InA. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp.933–952). Walter de Gruyter.
    [Google Scholar]
  33. Stevens, M. E., Giuliano, V. E., & Heilprin, L. B.
    (Eds.) (1965) Statistical Association Methods for Mechanized Documentation. Symposium Proceedings. Washington 1964. National Bureau of Standards.
    [Google Scholar]
  34. Uhrig, P., Evert, S., & Proisl, T.
    (2018) Collocation candidate extraction from dependency-annotated corpora: Exploring differences across parsers and dependency annotation schemes. InP. Cantos-Gómez & M. Almela-Sánchez (Eds.), Lexical Collocation Analysis: Advances and Applications (pp.111–140). Springer. 10.1007/978‑3‑319‑92582‑0_6
    https://doi.org/10.1007/978-3-319-92582-0_6 [Google Scholar]
  35. Wiechmann, D.
    (2008) On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4(2), 253–290. 10.1515/CLLT.2008.011
    https://doi.org/10.1515/CLLT.2008.011 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error