1887
Named Entities: Recognition, classification and use
  • ISSN 0378-4169
  • E-ISSN: 1569-9927
USD
Buy:$35.00 + Taxes

Abstract

The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named Entities (NEs) into English. Three different models of the NER have been developed. A semi-supervised learning method has been adopted to develop the first two models, one without linguistic features (Model A) and the other with linguistic features (Model B). The third one (Model C) is based on statistical Hidden Markov Model. A modified joint-source channel model has been used along with a number of alternatives to generate the English transliterations of Bengali NEs and vice-versa. The transliteration models learn the mappings from the bilingual training sets optionally guided by linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their representations in English. The NER system has demonstrated the highest average Recall, Precision and F-Score values of 89.62%, 78.67% and 83.79% respectively in Model C. Evaluation of the proposed transliteration models demonstrated that the modified joint source-channel model performs best in terms of evaluation metrics for person and location names for both Bengali to English (B2E) transliteration and English to Bengali transliteration (E2B). The use of the linguistic knowledge during training of the transliteration models improves performance.

Loading

Article metrics loading...

/content/journals/10.1075/li.30.1.07ekb
2007-01-01
2025-02-13
Loading full text...

Full text loading...

/content/journals/10.1075/li.30.1.07ekb
Loading
  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error