
Full text loading...
This paper reports a voted Named Entity Recognition (NER) system that exploits appropriate unlabeled data. Initially, we develop NER systems using the supervised machine learning algorithms such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). Each of these models makes use of the language independent features in the form of different contextual and orthographic word-level features along with the language dependent features extracted from the Part-of-Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method are also used as the features in each of the classifiers. A semi-supervised method is proposed to describe the measures to automatically select effective unlabeled documents as well as sentences from the unlabeled data. Finally, the supervised models are combined together into a final system by defining appropriate weighted voting technique. Experimental results for a resource-poor language like Bengali show the effectiveness of the proposed approach with the overall recall, precision and F-measure values of 93.81%, 92.18% and 92.98%, respectively.