21. A computational lexicography approach to phraseologisms
The cycle of lexicographic and linguistic work involved in compiling a computational phraseological database is divided into three phases and described in relation to the specific challenges multi-word expressions (MWEs) pose for a lexical database. Data collection is a process that is far from complete for the MWEs found in English, with the variability of some phrases making identification of all occurrences in large corpora a major challenge. Formalization of the form and variability ofMWEs is an interrelated process which can improve tools for data collection and other applications. Increased use of the phraseological lexical database in NLP applications can ultimately lead to further insights into the nature of MWEs and to improvements in the database. Due to the volume of lexicographic data on MWEs that still needs to be collected, analysed and formalized, and the cyclical nature of the work, the resulting lexical database should be reusable in as many applications as possible. <i>WordManager-PhraseManager</i>, the lexical resource described in the second part of the chapter, can capture the variability ofMWEs in a way that allows for maximum reusability of lexical data.