Volume 24, Issue 4
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



This paper introduces an experimental paradigm based on probabilistic evidence of the interaction between construction decisions in a parsed corpus. The approach is demonstrated using ICE-GB, a one million-word corpus of English. It finds an interaction between attributive adjective phrases in noun phrases with a noun head, such that the probability of adding adjective phrases falls successively. The same pattern is much weaker in adverbs preceding a verb phrase, implying this decline is not a universal phenomenon. Noun phrase postmodifying clauses exhibit a similar initial fall in the probability of successive clauses modifying the same NP head, and embedding clauses modifying new NP heads. Successive postmodification shows a secondary phenomenon of an increase in additive probability in longer sequences, apparently due to ‘templating’ effects. The author argues that these results can only be explained as cognitive and communicative natural phenomena acting on and within recursive grammar rules.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aarts, B.
    (2001) Corpus linguistics, Chomsky and Fuzzy Tree Fragments. InC. Mair & M. Hundt (Eds.), Corpus Linguistics and Linguistic Theory (pp.5–13). Amsterdam: Rodopi.
    [Google Scholar]
  2. Abeillé, A.
    (Ed.) (2003) Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer. 10.1007/978‑94‑010‑0201‑1
    https://doi.org/10.1007/978-94-010-0201-1 [Google Scholar]
  3. Anderson, J. R.
    (1983) The Architecture of Cognition. Cambridge, MA: Harvard University Press.
    [Google Scholar]
  4. Beaman, K.
    (1984) Coordination and subordination revisited: Syntactic complexity in spoken and written narrative discourse. InD. Tannen (Ed.), Spoken and Written Language: Exploring Orality and Literacy (pp.45–80). Norwood, NJ: Ablex.
    [Google Scholar]
  5. Böhmová, A., Hajič, J., Hajičová, E., & Hladká, B.
    (2003) The Prague Dependency Treebank: A three-level annotation scenario. InA. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp.103–127). Dordrecht: Kluwer. 10.1007/978‑94‑010‑0201‑1_7
    https://doi.org/10.1007/978-94-010-0201-1_7 [Google Scholar]
  6. Carroll, J., Minnen, G., & Briscoe, T.
    (2003) Parser evaluation: Using a grammatical relation annotation scheme. InA. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp.299–316). Dordrecht: Kluwer. 10.1007/978‑94‑010‑0201‑1_17
    https://doi.org/10.1007/978-94-010-0201-1_17 [Google Scholar]
  7. Davies, M.
    (2004–) British National Corpus (from Oxford University Press). Available online athttps://www.english-corpora.org/bnc/ (last accessedAugust 2019).
  8. Fang, A.
    (1996) The Survey Parser, design and development. InS. Greenbaum (Ed.), Comparing English Worldwide (pp.142–160). Oxford: Clarendon.
    [Google Scholar]
  9. Feist, J.
    (2011) Premodifiers in English: Their Structure and Significance. Cambridge: Cambridge University Press. 10.1017/CBO9780511733192
    https://doi.org/10.1017/CBO9780511733192 [Google Scholar]
  10. Garside, R., Leech, G. & Sampson, G.
    (Eds) (1987) The Computational Analysis of English: A Corpus-based Approach. London: Longman.
    [Google Scholar]
  11. Garside, R., & Leech, G.
    (1991) Running a grammar factory: The production of syntactically analysed corpora or ‘treebanks’. InS. Johansson & A.-B. Stenström (Eds.), English Computer Corpora: Selected Papers and Research Guide (pp.15–32). Berlin: Mouton de Gruyter.
    [Google Scholar]
  12. Greenbaum, S., & Ni, Y.
    (1996) About the ICE Tagset. InS. Greenbaum (Ed.), Comparing English Worldwide (pp.92–109). Oxford: Clarendon.
    [Google Scholar]
  13. Huddleston, R., & Pullum, G. K.
    (Eds.) (2002) The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. 10.1017/9781316423530
    https://doi.org/10.1017/9781316423530 [Google Scholar]
  14. Karlsson, F., Voutilainen, A., Heikkilä, J., & Antilla, A.
    (Eds.) (1995) Constraint Grammar: A Language-independent System for Parsing Unrestricted Text. Berlin: Mouton de Gruyter. 10.1515/9783110882629
    https://doi.org/10.1515/9783110882629 [Google Scholar]
  15. Leech, G.
    (1992) 100 million words of English: The British National Corpus. Language Research, 28(1), 1–13.
    [Google Scholar]
  16. Marcus, M., Marcinkiewicz, M. A., & Santorini, B.
    (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
    [Google Scholar]
  17. Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, J., & Schasberger, B.
    (1994) The Penn Treebank: Annotating predicate argument structure. InProceedings of the Workshop on Human Language Technology (pp.114–119). San Francisco, CA: Morgan Kaufmann. 10.3115/1075812.1075835
    https://doi.org/10.3115/1075812.1075835 [Google Scholar]
  18. Miller, G. A.
    (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. 10.1037/h0043158
    https://doi.org/10.1037/h0043158 [Google Scholar]
  19. Moreno, A., López, S., Sánchez, F., & Grishman, R.
    (2003) Developing a Spanish Treebank. InA. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp.149–163). Dordrecht: Kluwer. 10.1007/978‑94‑010‑0201‑1_9
    https://doi.org/10.1007/978-94-010-0201-1_9 [Google Scholar]
  20. Nelson, G., Wallis, S. A., & Aarts, B.
    (2002) Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam/Philadelphia, PA: John Benjamins. 10.1075/veaw.g29
    https://doi.org/10.1075/veaw.g29 [Google Scholar]
  21. Newcombe, R. G.
    (1998) Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17(8), 857–872. 10.1002/(SICI)1097‑0258(19980430)17:8<857::AID‑SIM777>3.0.CO;2‑E
    https://doi.org/10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E [Google Scholar]
  22. Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G.
    (2003) Building a Turkish Treebank. InA. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp.261–277). Dordrecht: Kluwer. 10.1007/978‑94‑010‑0201‑1_15
    https://doi.org/10.1007/978-94-010-0201-1_15 [Google Scholar]
  23. Pickering, M. & Ferreira, V. S.
    (2008) Structural Priming: A critical review. Psychological Bulletin, 134(3), 427–459. 10.1037/0033‑2909.134.3.427
    https://doi.org/10.1037/0033-2909.134.3.427 [Google Scholar]
  24. Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J.
    (1985) A Comprehensive Grammar of the English Language. London: Longman.
    [Google Scholar]
  25. Sheskin, D. J.
    (2011) Handbook of Parametric and Nonparametric Statistical Procedures (5th ed.). Boca Raton, FL: CRC Press.
    [Google Scholar]
  26. Sinclair, J.
    (1987) Grammar in the dictionary. InJ. Sinclair (Ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing (pp.104–115). London: Collins.
    [Google Scholar]
  27. Tannen, D.
    (1987) Repetition in conversation: Toward a poetics of talk. Language, 63(3), 574–605. 10.2307/415006
    https://doi.org/10.2307/415006 [Google Scholar]
  28. Wallis, S. A., & Nelson, G.
    (1997) Syntactic parsing as a knowledge acquisition problem. InE. Plaza & R. Benjamins (Eds.), Knowledge Acquisition, Modeling and Management. EKAW 1997. (pp.285–300). Berlin: Springer. 10.1007/BFb0026792
    https://doi.org/10.1007/BFb0026792 [Google Scholar]
  29. Wallis, S. A. & Nelson, G.
    (2000) Exploiting fuzzy tree fragments in the investigation of parsed corpora. Literary and Linguistic Computing, 15(3), 339–361. 10.1093/llc/15.3.339
    https://doi.org/10.1093/llc/15.3.339 [Google Scholar]
  30. Wallis, S. A.
    (2003) Completing parsed corpora: From correction to evolution. InA. Abeillé (Ed.), Treebanks: Building and Using Parsed Corpora (pp.61–71). Dordrecht: Kluwer. 10.1007/978‑94‑010‑0201‑1_4
    https://doi.org/10.1007/978-94-010-0201-1_4 [Google Scholar]
  31. (2008) Searching treebanks and other structured corpora. InA. Lüdeling & M. Kytö (Ed.), Corpus Linguistics: An International Handbook (pp.738–759). Berlin: Mouton de Gruyter.
    [Google Scholar]
  32. (2013) Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics, 20(3), 178–208. 10.1080/09296174.2013.799918
    https://doi.org/10.1080/09296174.2013.799918 [Google Scholar]
  33. (2014) What might a corpus of parsed spoken data tell us about language?InL. Veselovská & M. Janebová (Eds.), Complex Visibles out there. Proceedings of the Olomouc Linguistics Colloquium 2014: Language Use and Linguistic Structure (pp.641–662). Olomouc: Palacký University.
    [Google Scholar]
  34. (2019) Comparing χ² tables for separability of distribution and effect: Meta-tests for comparing homogeneity and goodness of fit test outcomes. Journal of Quantitative Linguistics, 26(4), 330–355. doi:  10.1080/09296174.2018.1496537
    https://doi.org/10.1080/09296174.2018.1496537 [Google Scholar]
  35. (forthcoming). Grammar and corpus methodology. InB. Aarts, G. Popova & J. Bowie Eds. The Oxford Handbook of English Grammar (pp.59–83). Oxford: Oxford University Press.
    [Google Scholar]
  36. Wilson, E. B.
    (1927) Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, no.158, 209–212. 10.1080/01621459.1927.10502953
    https://doi.org/10.1080/01621459.1927.10502953 [Google Scholar]
  37. van Zaanen, M., Roberts, A., & Atwell, E.
    (2004) A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. InL. Kranias, N. Calzolari, G. Thurmair, Y. Wilks, E. Hovy, G. Magnusdottir, A. Samtiou & K. Choukri (Eds.), Proceedings of LREC’04 Workshop on The Amazing Utility of Parallel and Comparable Corpora (pp.58–61). Lisbon: ELRA.
    [Google Scholar]
  38. Zipf, G. K.
    (1949) Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison Wesley.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error