Volume 20, Issue 1
  • ISSN 1387-6732
  • E-ISSN: 1570-6001
Buy:$35.00 + Taxes


The CELEX lexical database ( Baayen, Piepenbrock & van Rijn 1995 ) was developed in the 1990s, providing a database of the syntactic, morphological, phonological and orthographic forms of between 50,000 and 125,000 words of Dutch, English and German. This database was used as the basis for the development of the PolyLex lexicons, which included syntactic, morphological and phonological information for around 3,000 words of Dutch, English and German. Orthographic information was subsequently added in the PolyOrth project. The PolyOrth project was based on the assumption that the underlying, lexical phonological forms could be used to derive the surface orthographic forms by means of a combination of phoneme-grapheme mappings and sets of autonomous spelling rules for each language. One of the complications encountered during the project was the fact that the phonological forms in CELEX were not always genuinely underlying forms which made deriving the orthographic forms tricky. This paper discusses the nature and status of underlying phonological forms, their relation to orthography and the issues of finding this information in databases.


Article metrics loading...

Loading full text...

Full text loading...


  1. Baayen, Harald , Richard Piepenbrock & Hedderik van Rijn
    (1995) The CELEX lexical database, Release 2 (CD-ROM). Philidelphia, PA: Linguistic Data Consortium, University of Pennsylvania.
    [Google Scholar]
  2. Benesty, Jacob , M. M. Sondhi , Yiteng Huang
    (Eds.) (2008) Springer Handbook of Speech Processing, Berlin: Springer. doi: 10.1007/978‑3‑540‑49127‑9
    https://doi.org/10.1007/978-3-540-49127-9 [Google Scholar]
  3. Brown, Dunstan & Andrew Hippisley
    (2012) Network Morphology: A Defaults-based Theory of Word Structure, Cambridge: CUP. doi: 10.1017/CBO9780511794346
    https://doi.org/10.1017/CBO9780511794346 [Google Scholar]
  4. Burnage, Gavin
    (1996) CELEX: A Guide for Users, The CELEX lexical database, Release 2 (CD-ROM). Philidelphia, PA: Linguistic Data Consortium, University of Pennsylvania.
    [Google Scholar]
  5. Cahill, Lynne
    (2001) Semi-automatic construction of multilingual lexicons. Machine Translation Review. Electronic journal available atwww.bsc.org.uk/siggroup/nalantran/mtreview/mtr-12/.
    [Google Scholar]
  6. (1990) Syllable-based Morphology. Proceedings of the 13th International Conference on Computational Linguistics (COLING90), Vol. 3, Helsinki, Finland, August 1990, 48–53. doi: 10.3115/991146.991155
    https://doi.org/10.3115/991146.991155 [Google Scholar]
  7. Cahill, Lynne & Gerald Gazdar
    (1999) The PolyLex Architecture: Multilingual Lexicons for Related Languages. Traitement Automatique des Langues40.2: 5–23.
    [Google Scholar]
  8. Cahill, Lynne Carole Tiberius & Jon Herring
    (2013) PolyOrth: Orthography, phonology and morphology in inheritance lexicons. Written Language and Literacy16.2: 146–185. doi: 10.1075/wll.16.2.02cah
    https://doi.org/10.1075/wll.16.2.02cah [Google Scholar]
  9. Carney, Edward
    (1994) A survey of English Spelling. London: Arnold. doi: 10.4324/9780203199916
    https://doi.org/10.4324/9780203199916 [Google Scholar]
  10. Carroll, J. & C. Grover (1989) The derivation of a large computational lexicon of English from LDOCE. In B. Boguraev & E. Briscoe (eds.) Computational Lexicography for Natural Language Processing, 117–134. Harlow, UK: Longman.
    [Google Scholar]
  11. Evans, R. , P. Piwek , L. Cahill & N. Tipper
    (2008) Natural Language Processing in CLIME – a multilingual legal advisory system, Journal of Natural Language Engineering14:1: 101–132. doi: 10.1017/S135132490600427X
    https://doi.org/10.1017/S135132490600427X [Google Scholar]
  12. Evertz, Martin & Beatrice Primus
    (2013) The graphematic foot in English and German. Writing Systems Research, 5.1: 1–23. doi: 10.1080/17586801.2013.765356
    https://doi.org/10.1080/17586801.2013.765356 [Google Scholar]
  13. Finkel, Raphael & Gregory Stump
    (2007) Principal Parts and Morphological Typology. Morphology17.1: 39–75. doi: 10.1007/s11525‑007‑9115‑9
    https://doi.org/10.1007/s11525-007-9115-9 [Google Scholar]
  14. Goldrick, Matthew & Brenda Rapp
    (2007) Lexical and post-lexical phonological representations in spoken production, Cognition102: 219–260. doi: 10.1016/j.cognition.2005.12.010
    https://doi.org/10.1016/j.cognition.2005.12.010 [Google Scholar]
  15. Herring, Jon
    (2006) Orthography and the lexicon. PhD Dissertation, University of Brighton.
    [Google Scholar]
  16. Nerbonne, John
    (1998) Linguistic Databases. CSLI (ISBN: 9781575860930)
    [Google Scholar]
  17. New, Boris , Christophe Pallier , Marc Brysbaert & Dominic Ferrand
    (2004) Lexique 2: A New French Lexical Database, Behavior Research Methods, Instruments, & Computers36: 516. doi: 10.3758/BF03195598
    https://doi.org/10.3758/BF03195598 [Google Scholar]
  18. Nunn, Anneke
    (1998) Dutch Orthography: A systematic investigation of the spelling of Dutch words. The Hague: Holland Academic Graphics.
    [Google Scholar]
  19. Rollings, Andrew G.
    (2004) The spelling patterns of English. Munich: Lincom.
    [Google Scholar]
  20. Sampson, Geoffrey
    (2015) Writing Systems: A Linguistic Introduction. (2nd Edn.). Stanford: Stanford University Press.
    [Google Scholar]
  21. Sproat, Richard
    (2012) The Consistency of the Orthographically Relevant Level in Dutch, in Martin Neef , Anneke Neijt & Richard Sproat (Eds) The Relation of Writing to Spoken Language, 35–46. Berlin, Boston: Max Niemeyer Verlag.
    [Google Scholar]
  22. (2000) A computational theory of writing systems. Cambridge: CUP.
    [Google Scholar]
  23. Swadesh, Morris
    (1934) The Phonemic Principle. Language10.2: 117–129. doi: 10.2307/409603
    https://doi.org/10.2307/409603 [Google Scholar]
  24. Wells, John C.
    (1987) Computer coded phonetic transcription. Journal of the International Phonetic Association. 17.2: 94–114. doi: 10.1017/S0025100300003303
    https://doi.org/10.1017/S0025100300003303 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error