23. Combined statistical and grammatical criteria for the retrieval of phraseological units in an electronic corpus
The aim of this study is to refine and optimise the mainly statistical and distributional approach to the automatic extraction of phraseological units (PUs) from text corpora, by introducing minimal linguistic elements (lemmatisation and grammatical tagging). These operations were first tested using the same corpora as in our previous research (Pamies & Pazos 2003 & 2004). This provided us with a new set of results, which we compared with the previous ones.We found that the detection ability had improved substantially, especially when dealing with verb + noun and verb + adjective collocations. This methodology was then applied to a larger corpus. Again, the results were encouraging, with phraseological densities up to 64.5% for the verb + noun category.