Full text loading...
-
Drawbacks and Pitfalls of Machine-Readable Texts for Linguistic Research
- Source: International Journal of Corpus Linguistics, Volume 3, Issue 2, Jan 1998, p. 211 - 228
Abstract
The paper highlights and discusses some practical issues related to the drawbacks and pitfalls of computerised texts in regard to both databases themselves and the software employed to codify and search them. In the first place, some corpora and databases are compiled in such a way as to be searched and analysed by means of tools which allow only specific kinds of search to be made. This often prevents scholars from carrying out their own free study of the data, thus hindering an effective, targeted analysis. Moreover, in some cases, the need for comprehensiveness leads to the codification and classification of subjective aspects like the text difficulty and the participants' social level This subjectivity of interpretation might mislead the researchers in a socially-orientated analysis. Finally, despite being highly sophisticated, the techniques employed for automated grammatical and part-of-speech tagging as well as for semantic and prosodic parsing appear not to be totally reliable, since mistakes in the codification of simple items are likely to occur. Each of the above thorny issues, together with some other minor matters, are testified to with instances drawn from the author's personal linguistic research on a variety of synchronic and diachronic corpora and databases.