1887
Volume 18, Issue 4
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
USD
Buy:$35.00 + Taxes

Abstract

This study aims to develop a new computing method for extracting contiguous phraseological sequences (PSs) of various lengths from academic text corpora by measuring internal associations of n-grams. We construct a new normalizing algorithm of probability-weighted average for refining the MI measure and enhancing precision in extracting PSs from corpora. This computing method is applied to the data in a medium-sized text corpus of academic English. Results indicate that the resultant new MI measure can provide statistics which better reveal internal associations within an n-gram, regardless of size. Lexico-grammatical sequences extracted with this method are more complete and less arbitrary in terms of grammar and semantics. The method can be applied to treating a variety of linguistic phenomenon, ranging from well-established phrases to likely phrasal entities, thus having potentially practical applications in corpus-based studies of phraseology and natural language processing.
Loading

Article metrics loading...

/content/journals/10.1075/ijcl.18.4.03wei
2013-01-01
2019-10-21
Loading full text...

Full text loading...

References

http://instance.metastore.ingenta.com/content/journals/10.1075/ijcl.18.4.03wei
Loading
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error