1887
Volume 13, Issue 2
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
USD
Buy:$35.00 + Taxes

Abstract

This paper presents a hybrid model for part-of-speech (POS) guessing of Chinese unknown words. Most previous studies on this task have developed a unified statistical model for all Chinese unknown words and have rejected rule-based models without testing. We argue that models that use different sources of information about unknown words, both structural and contextual, can be effective for handling different types of unknown words. We propose a rule-based model that uses information about the type, length, and internal structure of unknown words and combine it with two existing statistical models that use information about the POS context and component characters of unknown words respectively for this task. By combining the complementary strengths of the three models that use different sources of information, the hybrid model achieves an accuracy of 89%, a significant improvement over the best result reported in previous studies.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.13.2.03lu
2008-01-01
2025-04-28
Loading full text...

Full text loading...

/content/journals/10.1075/ijcl.13.2.03lu
Loading
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error