Named Entities: Recognition, classification and use

Lingvisticæ Investigationes - Volume 30, Issue 1, 2007

Named Entities: Recognition, classification and use

Volume 30, Issue 1, 2007

Foreword

pp.: 1–2 (2)

https://doi.org/10.1075/li.30.1.01for
More Less
Add to my favourites

Email this

A survey of named entity recognition and classification

Author(s): David Nadeau and Satoshi Sekine

pp.: 3–26 (24)

https://doi.org/10.1075/li.30.1.03nad
More Less
This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words in a document. Evaluation techniques, ranging from intuitive exact match to very complex matching techniques with adjustable cost of errors, are an indisputable key to progress.
Add to my favourites

Email this

Diversity in logarithmic opinion pools

Author(s): Andrew D.M. Smith and Miles Osborne

pp.: 27–47 (21)

https://doi.org/10.1075/li.30.1.04smi
More Less
Conditional random fields are state-of-the-art models for sequencing tasks such as named entity recognition. However, being globally conditioned, they have a tendency to overfit to a greater extent than other sequencing models. We introduce an approach to combat this overfitting called a logarithmic opinion pool (LOP). A LOP consists of a weighted combination of constituent models. We present the theory behind LOPs, and show that effective LOPs require constituent models that are diverse from one another. We examine different ways to introduce such diversity, including an approach that involves training the constituent models together, interactively. Our results show that, as expected from the underlying theory, explicitly optimising for constituent model diversity can improve performance over standard approaches to regularisation.
Add to my favourites

Email this

Handling conjunctions in named entities

Author(s): Pawel Mazur and Robert Dale

pp.: 49–68 (20)

https://doi.org/10.1075/li.30.1.05maz
More Less
Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of ‘name internal’ features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. We argue that further improvements are likely to require the use of wider domain knowledge and name external features.
Add to my favourites

Email this

Complex named entities in Spanish texts: Structures and properties

Author(s): Sofía N. Galicia-Haro and Alexander Gelbukh

pp.: 69–94 (26)

https://doi.org/10.1075/li.30.1.06gal
More Less
We present a linguistic analysis of Named Entities in Spanish texts. Our work is focused on the determination of the structure of complex proper names: names with coordinated constituents, names with prepositional phrases and names formed by several content words initialized by a capital letter. We present the analysis of circa 49,000 examples obtained from Mexican newspapers. We detailed their structure and give some notions about the context surrounding them. Since named entities belong to open class of words they are being created daily, so the challenge for a named entity recognizer is to precisely determine the boundaries of new entity names in any text and to analyze thoroughly their components for deep semantic analysis. Knowing their general classes of structure it should be possible to derive useful heuristics or a specific grammar for natural language processing applications.
Add to my favourites

Email this

Named Entity Recognition and transliteration in Bengali

Author(s): Asif Ekbal, Sudip Kumar Naskar and Sivaji Bandyopadhyay

pp.: 95–114 (20)

https://doi.org/10.1075/li.30.1.07ekb
More Less
The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named Entities (NEs) into English. Three different models of the NER have been developed. A semi-supervised learning method has been adopted to develop the first two models, one without linguistic features (Model A) and the other with linguistic features (Model B). The third one (Model C) is based on statistical Hidden Markov Model. A modified joint-source channel model has been used along with a number of alternatives to generate the English transliterations of Bengali NEs and vice-versa. The transliteration models learn the mappings from the bilingual training sets optionally guided by linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their representations in English. The NER system has demonstrated the highest average Recall, Precision and F-Score values of 89.62%, 78.67% and 83.79% respectively in Model C. Evaluation of the proposed transliteration models demonstrated that the modified joint source-channel model performs best in terms of evaluation metrics for person and location names for both Bengali to English (B2E) transliteration and English to Bengali transliteration (E2B). The use of the linguistic knowledge during training of the transliteration models improves performance.
Add to my favourites

Email this

A note on the semantic and morphological properties of proper names in the Prolex project

Author(s): Duko Vitas, Cvetana Krstev and Denis Maurel

pp.: 115–133 (19)

https://doi.org/10.1075/li.30.1.08vit
More Less
In this paper we present a linguistic approach to the analysis of proper names. The basic assumption of our approach is that proper names are linguistic units of text that should be treated using the same methods that are applied to text in its totality. We illustrate the inflectional and derivational properties of simple and multi-word proper names on the example of Serbian, and describe how these properties have been formalized in order to develop e-dictionaries of the DELA type. In order to support multi-lingual applications we have developed a model of a multilingual relational dictionary of proper names based on an ontology, as well as an actual database. Finally, we outline how the developed dictionaries and database can be used in real monolingual and multi-lingual applications, such as information extraction.
Add to my favourites

Email this

Cross-lingual Named Entity Recognition

Author(s): Ralf Steinberger and Bruno Pouliquen

pp.: 135–162 (28)

https://doi.org/10.1075/li.30.1.09ste
More Less
Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.
Add to my favourites

Email this

Volumes & issues

Volume 46 (2023)
Volume 45 (2022)
Volume 44 (2021)
Volume 43 (2020)
Volume 42 (2019)
Volume 41 (2018)
Volume 40 (2017)
Volume 39 (2016)
Volume 38 (2015)
Volume 37 (2014)
Volume 36 (2013)
Volume 35 (2012)
Volume 34 (2011)
Volume 33 (2010)
Volume 32 (2009)
Volume 31 (2008)
Volume 30 (2007)
Volume 29 (2006)
Volume 28 (2005)
Volume 27 (2004)
Volume 26 (2003)
Volume 25 (2002)
Volume 24 (2001)
Volume 23 (2000)
Volume 22 (1998)
Volume 21 (1997)
Volume 20 (1996)
Volume 19 (1995)
Volume 18 (1994)
Volume 17 (1993)
Volume 16 (1992)
Volume 15 (1991)
Volume 14 (1990)
Volume 13 (1989)
Volume 12 (1988)
Volume 11 (1987)
Volume 10 (1986)
Volume 9 (1985)
Volume 8 (1984)
Volume 7 (1983)
Volume 6 (1982)
Volume 5 (1981)
Volume 4 (1980)
Volume 3 (1979)
Volume 2 (1978)
Volume 1 (1977)

Most Cited

- A survey of named entity recognition and classification
  
  Author(s): David Nadeau and Satoshi Sekine
- Whether We Agree or Not: A Comparative Syntax of English and Japanese
  
  Author(s): S.Y. Kuroda
- La Construction "Se-Moyen" Du Français Et Son Statut Dans Le Triangle moyen-passif-réfléchi
  
  Author(s): Anne Zribi-Hertz
- Appraisal of Opinion Expressions in Discourse
  
  Author(s): Nicholas Asher, Farah Benamara and Yvette Yannick Mathieu
- How motion verbs are spatial: The spatial foundations of intransitive motion verbs in French
  
  Author(s): Michel Aurnague
- Sur La Sémantique Des Descriptions Demonstratives
  
  Author(s): Georges Kleiber
- Does morphology play a role in L2 processing?: Two masked priming experiments with Greek speakers of ESL
  
  Author(s): Madeleine Voga, Anna Anastassiadis-Syméonidis and Hélène Giraudo
- Foreword
- Dative verbs: A crosslinguistic perspective
  
  Author(s): Beth Levin
- Named Entity Recognition and transliteration in Bengali
  
  Author(s): Asif Ekbal, Sudip Kumar Naskar and Sivaji Bandyopadhyay
More Less

Lingvisticæ Investigationes - Volume 30, Issue 1, 2007

Volume 30, Issue 1, 2007

Volumes & issues

Most Read This Month

Most Cited