- Home
- e-Journals
- International Journal of Chinese Linguistics
- Previous Issues
- Volume 11, Issue 1, 2024
International Journal of Chinese Linguistics - Volume 11, Issue 1, 2024
Volume 11, Issue 1, 2024
-
Transitional probability between characters as a component of sentence processing in Chinese
Author(s): Tianlin Wang and Matthew J. Cooper Borkenhagenpp.: 5–29 (25)More LessAbstractThis study explores the role of transitional probability (TP) in sentence processing in Chinese, a writing system that presents unique challenges due to its character-based structure and lack of word boundaries. The research investigates how the statistical regularities of character meaning, as captured by TP, aid in word segmentation and impact reading comprehension. Utilizing a moving window task, the study examines the processing speed of characters in high versus low TP conditions. Findings reveal that characters in high TP bigram conditions (indicating a consistent semantic association within a bigram) are processed more quickly, underscoring the importance of this statistical property of characters in Chinese sentence reading. These findings challenge conventional notions in Chinese linguistics concerning the relationship between characters, morphemes, and semantics, and suggests an alternative perspective on (and the need for reevaluation of) character-level semantics. The study also highlights the influence of prosodic context on reading speed, indicating that anticipatory linguistic patterns shape reader processing.
-
Are separable words words or phrases?
Author(s): Quansheng Xia and Ai Wangpp.: 30–55 (26)More LessAbstractBoth traditional linguistics and psycholinguistics have extensively explored the issue of the category that separable words belong to, yet different opinions persist. Building upon previous research, this study selects verb-complement structures as its focal point. Based on the number of internally insertable elements, these structures are categorized into verb-complement compounds, verb-complement compact structures, verb-complement loose structures, and verb-complement phrases. The study compares the processing similarities and differences between the four types of structures with and without inter-component spacing so that the “disconnected” and “connected” states of the four structures are investigated. Experimental results indicate that regardless of the insertion of spaces, the reaction times for processing verb-complement compounds, compact structures, and loose structures are shorter than those for phrases. In the comparison of presence and absence of spaces, compounds and compact structures exhibit greater consistency, whereas no significant differences are observed between loose structures and phrases. This suggests that the processing of verb-complement compact structures closely resembles that of words, while the processing of loose structures embodies characteristics of both compounds and phrases, yet differs from both words and phrases. This study demonstrates that based on the degree of internal expansion, separable words can be further classified into subcategories, existing in a transitional state between words and phrases, forming a continuous continuum with compounds and phrases.
-
Examining the role of distributional information and structural types in multiword sequence processing by Chinese preschool children
Author(s): Lu Wang, Wenbo Yu, Yiran Peng and Dandan Liangpp.: 56–93 (38)More LessAbstractMultiword sequences (MWSs) are units between words and sentences, which can help people to achieve native-like proficiency in a language. However, the extent to which Chinese native (L1) speakers, especially preschool children in the midst of language development, comprehend and process MWSs remains uncertain. While previous research has primarily focused on the distributional properties of MWSs, whether Chinese preschool children exhibit sensitivity to the distributional information of MWSs requires further examination. In addition, the potential influence of structural types on such sensitivity has received limited attention. This study examined Chinese preschool children’s sensitivity to distributional information and structural types of MWSs when processing Mandarin MWSs. Participants performed an imitating-production task. Linear mixed-effects models revealed that children were sensitive to two types of distributional information of MWSs: MWS frequency and MWS contingency. Intriguingly, no evidence was found to suggest sensitivity to the structural types of MWSs. Furthermore, the findings demonstrated not only the continuous effects of MWS frequency and contingency but also an interaction between these two factors. This study thus indicates that during the processing of MWSs, the performance of Chinese preschool children is influenced by the distributional information of MWSs while remaining unaffected by structural types.
-
Rethinking tokenization
Author(s): Jinbiao Yangpp.: 94–109 (16)More LessAbstractTokenization significantly influences language models (LMs)’ performance. This paper traces the evolution of tokenizers from word-level to subword-level, analyzing how they balance tokens and types to enhance model adaptability while controlling complexity. Despite subword tokenizers like Byte Pair Encoding (BPE) overcoming many word tokenizer limitations, they encounter difficulties in handling non-Latin languages and depend heavily on extensive training data and computational resources to grasp the nuances of multiword expressions (MWEs). This article argues that tokenizers, more than mere technical tools, should drawing inspiration from the cognitive science about human language processing. This study then introduces the “Principle of Least Effort” from cognitive science, that humans naturally seek to reduce cognitive effort, and discusses the benefits of this principle for tokenizer development. Based on this principle, the paper proposes that the Less-is-Better (LiB) model could be a new approach for LLM tokenizer. The LiB model can autonomously learn an integrated vocabulary consisting of subwords, words, and MWEs, which effectively reduces both the numbers of tokens and types. Comparative evaluations show that the LiB tokenizer outperforms existing word and BPE tokenizers, presenting an innovative method for tokenizer development, and hinting at the possibility of future cognitive science-based tokenizers being more efficient.
-
汉语句子的基本单位刍议 [Notes on the basic unit of Chinese sentences]
Author(s): Jinman Li (李金满)pp.: 110–129 (20)More Less摘要汉语中“词”的界定一直是个棘手的话题。本文从广受国内外学者关注的汉语关系从句的加工单位说起,综合多源证据,探讨汉语句子的基本分析单位是否是“词”的问题。文章首先基于前期汉语关系从句自定速阅读加工研究中的相关实验刺激材料,对比分析同一实验、同一研究、同类研究之间采用的加工单位情况;然后综合来自眼动实验、容错现象、自然语言处理、汉英构词对比等多个研究领域中的证据,进一步审视汉语句子基本单位的问题。最后,我们从语言学视角,梳理国内外学者对汉语中“字”、“词”看法的演变,认为可以将汉语句子单位纳入类型学视野下进行考量,探索更合理普适的解决方案,促进语言相关领域的研究。
-
Review of Jiang (2014): Advances in Chinese as a second language: Acquisition and processing
Author(s): Qi Sun and Jamie Gahtanpp.: 130–142 (13)More LessThis article reviews Advances in Chinese as a second language: Acquisition and processing
-
Review of Liang (2022): Interventions for children with language disorders
Author(s): Yongtao Xiaopp.: 143–149 (7)More LessThis article reviews Interventions for children with language disorders
Most Read This Month

-
-
限定性和汉语主句 [Finiteness and Chinese main clauses]
Author(s): Rint Sybesma (司马翎)
-
-
-
Rethinking tokenization
Author(s): Jinbiao Yang
-
- More Less