- Home
- e-Journals
- Lingvisticæ Investigationes
- Previous Issues
- Volume 22, Issue 1-2, 1998
Lingvisticæ Investigationes - Volume 22, Issue 1-2, 1998
Volume 22, Issue 1-2, 1998
Les graphes INTEX
Author(s): Max Silberzteinpp.: 3–29 (27)More Lessintex basic tools are Finite State Transducers that are represented by graphs. We show how these graphs are used to describe various linguistic phenomena. Graphs that represent elementary phenomena are re-used in other, more complex, graphs. Thus, linguists can actually build massive libraries of graphs that handle a variety of phenomena, from the vocabulary and the morphology to the syntax of natural languages. intex provides linguists with tools to easily edit, maintain, debug and validate any piece of the description by applying graphs, or sets of graphs, to large texts.
Première Partie: Traitement de la coordination à l’intérieur des groupes nominaux
Author(s): Catherine Dominguespp.: 33–57 (25)More LessCoordination within noun phrases is a very general phenomenon in technical corpora. These noun phrases are composed of a noun head followed by one or more modifiers, and the coordination can affect either of the elements. The purpose is to make the noun phrases complete on both sides of the coordination, in order to improve recall in automatic interrogation. The tools are provided by intex: dictionaries, dictionaries of compounds, the software for writing transducers. First, coordinated noun-phrases are classified according to a typology. Then we present rewrite rules to handle the agreement of the modifier, the use of the possessive determiner, the repetition of the noun head within the modifier, the determiner or preposition zeroing in the right part of the coordination; and the construction symmetry within the noun phrases. Finally, we apply two rules and show the results. Not all the rules have yet been tested, but provisional conclusions can be drawn from this demonstration.
Nouvelles concordances pour l’enseignement des langues
Author(s): Mylene Garriguespp.: 59–69 (11)More LessThe use of concordances in the teaching or learning of foreign languages is a common practice. This has developed into a distinct field of research for language educationalists. This paper shows how the features of the intex system offer new possibilities in this domain.
Lemmatization of compound tenses in English
Author(s): Maurice Grosspp.: 71–122 (52)More LessWe generalize the process of lemmatization of verbs to their compound tenses. Usually, lemmatization is limited on verbs conjugated by means of suffixes; tense auxiliaries and modal verbs (e.g. I have left, I am leaving, I could leave) are ignored. We have constructed a set of 83 finite-state grammars which parse auxiliary verbs and thus recognizes the ‘head verb’, that is, the lemma. We generalize the notion of auxiliary verb to verbs with sentential complements which have transformed constructions (e.g. I want to go) that can be parsed in exactly the same way as tense auxiliaries or modal verbs. Ambiguities arise, in particular because adverbial inserts occur inside the compound verbs,. We show how local grammars describing nominal contexts can be used to reduce the degree of ambiguity.
Linguistique comparée, dictionnaires électroniques et INTEX
Author(s): Jacques Labellepp.: 123–133 (11)More LessThe automatic analysis of French texts using intex produces good results as long as the French dictionaries are complete. But Quebec French texts have a lot of linguistic particularities and pose specific problems. So the building of electronic dictionaries of Quebec French is essential. Such dictionaries have been in the process of construction over the last few years. intex appears to be a very efficient tool for comparing dictionaries and finding new words, or new uses of words, that must be added to these dictionaries, before they can be integrated into the intex system.
Le lexique du film La Haine: Analyse automatique avec le systeme INTEX
Author(s): Lidia Martinopp.: 135–142 (8)More LessApplying intex to a corpus allows all the new words (i.e. words that are not in intex dictionaries) that appear in it to be listed. This function is particularly useful in the retrieval of slang terms from corpora such as film scripts. We take here the example of the French film La Haine.
A study of ambiguity using INTEX
Author(s): Morris Salkoffpp.: 143–154 (12)More LessThe resolution of lexical ambiguity presents a problem for researchers in machine translation. Since many words are multiply ambiguous, and it is frequently the case that each of their senses has a different translation into another language, the question of the proper resolution of the translation of an ambiguous word would seem to be central to the construction of a program of machine translation. However, the study undertaken here of the occurrences of three ambiguous words in current text (the newspaper Le Monde, 1992) shows that up to 80% of the occurrences of these words are in collocations (idioms) or frozen expressions that have an unambiguous translation into English. This indicates that the problem of word ambiguity may in fact be largely exaggerated, and will present a real difficulty in only a small percentage of the occurrences of such words.
Vers une structuration de dictionnaires servant à la Tao
Author(s): Dieter Seelbachpp.: 155–172 (18)More LessWe have investigated the contrastive French-German aspects of about 50 simple nominal and compound adjectival predicates, and designed their entries for an electronic dictionary for computer-assisted translation. Their co-occurrence with support verbs expressing different aspectual semantic functions is coded; in the case of the nouns, their co-occurrence with determiners is also coded, as is the syntactic and semantic specification of their arguments in terms of semantic classes called “object classes”. As an illustration of the envisaged translation methodology based on lexicon-grammar, the syntactic tagging of different formal types of simple and compound predicates, and the semantic tagging of their arguments, are demonstrated with reference to a German newspaper text and its French translation. intex will be used not only for identifying and translating compound words or multi-word expressions, but also — as an additional, but necessary tool — for finding object classes together with their appropriate predicates and support verbs, because object classes are clearly syntactically defined.
INTEX pour l’annotation semi-automatique d’un corpus d’anaphores
Author(s): Agnès Tutinpp.: 173–189 (17)More LessAnaphors constitute a well-known problem in automatic text generation and natural language understanding. Using corpora to deal with such phenomena could help to develop robust processing techniques. Building such resources is, though, a tedious and time-consuming task and could more easily be accomplished by partial automation. In this paper, we show how the intex system can be used for this task. We show that in a newspaper corpus (in this case, le Monde Diplomatique), discursive grammatical anaphors can easily be located via associated linguistic features. A series of transducers generating tags for categories and functions can thus be built, and constitutes an efficient pre-processing stage (though manual checking remains necessary). The heuristics, quickly and easily developed, are specific to the task. The study goes on to show, however, that discarding non-anaphoric pronouns is not straightforward in the case of non-referential personal pronouns or indefinite pronouns, and that the tagging of the grammatical function seems limited in the absence of real syntactic processing.
Deuxième Partie: Quelques remarques sur un dictionnaire électronique d’adverbes composés en espagnol
Author(s): Xavier Blanco and Dolors Catalàpp.: 213–232 (20)More LessWithin the framework of studies using the ladl system of electronic dictionaries, the group Applied Linguistics in Romance Languages of the uab has undertaken the construction of an electronic dictionary for frozen compound adverbs. This dictionary completes the delacs (Dictionary for Compound Words of Spanish). This paper briefly presents the characteristics of each type of frozen compound adverb and also the choices that underlie the development of the tables in which they are recorded. Details are also given about the state of electronic dictionary of frozen compound adverbs currently available on intex. Since this is a preliminary study, the final section is devoted to identifying possible future developments, rather than drawing conclusions.
Développer des grammaires locales de levée d’ambiguïtés pour INTEX
Author(s): Anne Disterpp.: 233–247 (15)More LessIt is possible for an intex user to construct his/her own grammars of disambiguation. In this paper, we shall identify the elementary principles required to construct such grammars, and the specific problems in the management of a library of graphs. In the second part of the paper, we shall explain how the grammars that we have already constructed can be applied to a text, and which results are possible.
Ressources linguistiques du portugais implémentées sous INTEX
Author(s): Elisabete Ranchhodpp.: 263–277 (15)More LessLarge-coverage dictionaries and grammars have been built for Portuguese using ladl methods and formats. They are now used by intex in general text processing operations. We present the main features of such linguistic data, and we refer to their maintenance and extension. The presentation is illustrated by various examples of automatic text parsing based on those dictionaries and grammars.
Un dictionnaire de noms propres pour INTEX: Les noms propres géographiques
Author(s): Denis Maurel and Odile Pitonpp.: 279–289 (11)More LessIn this paper, we begin by presenting the electronic relational dictionary of proper names created within the ‘Prolex’ project. We demonstrate the algorithm used for linking proper names in a text with reference to a particular application; this algorithm uses a specific transducer. We then describe our ‘Proper Name Dictionary’ for intex software. This currently contains just geographic proper names: hydronyms, toponyms, names of inhabitants and also toponymic adjectives.
Normalisation des textes anglais
Author(s): Katia Zellaguipp.: 291–307 (17)More LessThe present study deals with the pre-processing of texts. This pre-processing is performed in three steps, which are: the segmentation of the texts into textual units (sentences), the re-writing of contracted forms into a standard form, and the tagging of unambiguous compounds. We describe here two of the three steps: text segmentation, and the re-writing of contracted forms. The segmentation of the texts into textual units is made possible by using the transducer Sentence. The re-writing of contracted forms into their standard forms is done by applying the transducer Normalisation. We describe in detail the various steps involved in the development of both transducers.
Troisième Partie: Fonctionnalités INTEX dans l’outil d’aide à la traduction: LexPro CD Databank
Author(s): Anne Chrobotpp.: 311–325 (15)More LessNous présentons une application industrielle de certaines fonctionnalités du système de traitement de grands corpus intex dans le domaine d’aide à la traduction. Il s’agit du logiciel LexPro CD Databank réalisé et commercialisé par la société lci (Jouy-en-Josas, France) qui est partenaire du ladl.
GlossaNet: Parsing a web site as a corpus
Author(s): Cédrick Faironpp.: 327–340 (14)More LessGlossaNet is an automated system that monitors Web sites. On dates and at intervals selected by the user, GlossaNet downloads the Web site, converts it to an electronic corpus and uses the intex programs (M. Silberztein 1993) and the linguistic resources of the ladl (electronic dictionaries and libraries of local grammars) to parse it. Once the software has been set up, it automatically repeats the task at regular periods of time (as the Web site is updated). Results, if any, are e-mailed to the user.
Elimination of lexical ambiguities by grammars: The ELAG system
Author(s): Éric Laporte and Anne Monceauxpp.: 341–367 (27)More LessWe present a new, intex-compatible formalism for the description of distributional constraints, ‘elag’ (Elimination of Lexical Ambiguities by Grammars). The constraints may be checked against text, and the lexical ambiguity of the text may thus be partly resolved. We describe and exemplify the main properties of elag with the aid of simple rules, formalizing exploitable constraints. We specify in detail the effect of applying an elag rule or grammar to a text. We examine the practical properties of the formalism from the point of view of a rule writer. We describe our separate, intex-compatible evaluation procedure for the lexical disambiguation results.
Volumes & issues
Volume 46 (2023)
Volume 45 (2022)
Volume 44 (2021)
Volume 43 (2020)
Volume 42 (2019)
Volume 41 (2018)
Volume 40 (2017)
Volume 39 (2016)
Volume 38 (2015)
Volume 37 (2014)
Volume 36 (2013)
Volume 35 (2012)
Volume 34 (2011)
Volume 33 (2010)
Volume 32 (2009)
Volume 31 (2008)
Volume 30 (2007)
Volume 29 (2006)
Volume 28 (2005)
Volume 27 (2004)
Volume 26 (2003)
Volume 25 (2002)
Volume 24 (2001)
Volume 23 (2000)
Volume 22 (1998)
Volume 21 (1997)
Volume 20 (1996)
Volume 19 (1995)
Volume 18 (1994)
Volume 17 (1993)
Volume 16 (1992)
Volume 15 (1991)
Volume 14 (1990)
Volume 13 (1989)
Volume 12 (1988)
Volume 11 (1987)
Volume 10 (1986)
Volume 9 (1985)
Volume 8 (1984)
Volume 7 (1983)
Volume 6 (1982)
Volume 5 (1981)
Volume 4 (1980)
Volume 3 (1979)
Volume 2 (1978)
Volume 1 (1977)
Most Read This Month