- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 3, Issue, 1998
International Journal of Corpus Linguistics - Volume 3, Issue 2, 1998
Volume 3, Issue 2, 1998
-
The Linguistic Annotation of Corpora: The TOSCA Analysis System
Author(s): Jan Aarts, Hans van Halteren and Nelleke Oostdijkpp.: 189–210 (22)More LessThe article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.
-
Drawbacks and Pitfalls of Machine-Readable Texts for Linguistic Research
Author(s): Roberta Facchinettipp.: 211–228 (18)More LessThe paper highlights and discusses some practical issues related to the drawbacks and pitfalls of computerised texts in regard to both databases themselves and the software employed to codify and search them. In the first place, some corpora and databases are compiled in such a way as to be searched and analysed by means of tools which allow only specific kinds of search to be made. This often prevents scholars from carrying out their own free study of the data, thus hindering an effective, targeted analysis. Moreover, in some cases, the need for comprehensiveness leads to the codification and classification of subjective aspects like the text difficulty and the participants' social level This subjectivity of interpretation might mislead the researchers in a socially-orientated analysis. Finally, despite being highly sophisticated, the techniques employed for automated grammatical and part-of-speech tagging as well as for semantic and prosodic parsing appear not to be totally reliable, since mistakes in the codification of simple items are likely to occur. Each of the above thorny issues, together with some other minor matters, are testified to with instances drawn from the author's personal linguistic research on a variety of synchronic and diachronic corpora and databases.
-
Partial Parsing: Boundary Marking
Author(s): David Coniampp.: 229–249 (21)More LessThis paper describes a computer program which performs a particular type of grammatical/syntactic analysis—the assigning of structural boundaries between orthographic words in written English text. The Boundary Marker has been designed, in principle, as an analyser of unrestricted text and has been developed by using, as far as possible, authentic text as data for analysis. This paper first presents a brief overview of boundary marking as a method of syntactic analysis. It then describes how the program processes text and reports on the analysis of 10 000 words of text from the media. The paper concludes with a discussion of the advantages of a tightly focused analytic tool such as the Boundary Marker.
-
A LOB-Corpus-based Semantic Profile of the Adjective in English Supplementive Clauses
Author(s): Salvador Valera and Alfonso Rizo Rodriguezpp.: 251–278 (28)More LessOne of the various forms that the expression of attribution may take in English is through a supplementive clause, a reduced structure realized by an adjective phrase hypotactically connected with a superordinate clause. The construction under study exhibits an attributive character in that the adjective predicates about the NP subject, but also possesses an adverbial import in so far as it expresses diverse circumstances relating to the main clause.This kind of structure is, however, not entirely free of constraints; in fact, not every adjective may combine with a matrix verb, and certain semantic patterns can be observed to occur recurrently in these constructions. This paper surveys a substantial number of adjectives from the LOB corpus for the identification of the semantic profile proper to supplementive adjectives.
-
Structure and Usage of the Tartu University Corpus of Written Estonian
Author(s): T. Hennoste, Mare Koit, T. Roosmaa and M. Saluveerpp.: 279–304 (26)More LessThis paper provides an overview of the first computer corpus of the Estonian language compiled at the University of Tartu. It was based on the design principles of the LOB and Brown corpora. The main part of the corpus was assembled from 1991-1995 and contains about 1 million textual words. It was compiled by an interdepartmental computational linguistics research group of the university. This paper gives a survey of the text groups in the corpus and of the problems the compilers had to solve together with the proposed solutions and outlines the main differences from the model corpora and the underlying reasons for them. These are followed by a review of the available computer routines for processing the corpus.
-
Corpus Linguistics for Application Development
Author(s): Luca Dini and Vittorio Di Tomasopp.: 305–318 (14)More LessCorpus linguistics and the development of commercial NLP applications are two tightly linked activities. It is hard to conceive fast development of high quality applications without proper tools for inspecting the corpora pertaining the application domain. At the same time, it is hard to conceive reliable corpus analysis tools that do not satisfy the standards of software engineering. In the present paper, we will prove the validity of such a concept by showing how application development at CELI benefited from corpus-oriented tools and how these corpus-oriented tools have been produced as a by-product of the technology developed for real applications.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false
-
-
Comparing Corpora
Author(s): Adam Kilgarriff
-
- More Less