Volume 14, Issue 1
  • ISSN 1877-7031
  • E-ISSN: 1877-8798
Buy:$35.00 + Taxes



Many Chinese characters have more than one form of writing owing to complex nature of creation and long evolvement history of writing. Most existing Chinese dictionaries list these variant forms but do not explain in a systematic way why a specific character is a variant form of another, and only list a few older key bibliographies, many of which are themselves dictionaries of various forms. In this article we present a new theory and practice of how to determine whether a Chinese character is a variant of another, and show how we can deduce a dictionary of variant characters automatically from a corpus of ancient Chinese texts totaling 2.3 billion characters with artificial intelligence techniques. Results show that in over 74,000 instances of identified variant character groups, more than 20,000 new instances are found by our algorithm. We have then compiled all the instances into a dictionary and call it (異體字詞典, ). The key insight of our theory is to find synonymous words with variant characters. The dictionary has already been put online for several years and everyone can freely access and edit it like the way they do on Wikipedia.


Article metrics loading...

Loading full text...

Full text loading...


  1. Bradski, G., & A. Kaehler
    (2000) OpenCV. Dr. Dobb’s Journal of Software Tools, 31.
    [Google Scholar]
  2. Church, K. W.
    (2017) Word2Vec. Natural Language Engineering, 23(1), 155–162. 10.1017/S1351324916000334
    https://doi.org/10.1017/S1351324916000334 [Google Scholar]
  3. Devlin, J., M. W. Chang, K. Lee, & K. Toutanova
    (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    [Google Scholar]
  4. Kraskov, A., H. Stögbauer, & P. Grassberger
    (2004) Estimating mutual information. Physical review E, 69(6), 066138. 10.1103/PhysRevE.69.066138
    https://doi.org/10.1103/PhysRevE.69.066138 [Google Scholar]
  5. Morioka, T.
    (2008) CHISE: Character processing based on character ontology. InInternational Conference on Large-Scale Knowledge Resources, edited byTakenobu Tokunaga and Antonio Ortega, 148–162. Springer, Berlin, Heidelberg. 10.1007/978‑3‑540‑78159‑2_14
    https://doi.org/10.1007/978-3-540-78159-2_14 [Google Scholar]
  6. Xiao, Lei, Xiaohe Chen
    (2010) Automatic Detection of Version Differences among Ancient Chinese Texts. Journal of Chinese Information Processing, 24(5): 50–56.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error