- Home
- e-Journals
- The Mental Lexicon
- Fast Track Listing
The Mental Lexicon - Online First
Online First articles are the published Version of Record, made available as soon as they are finalized and formatted. They are in general accessible to current subscribers, until they have been included in an issue, which is accessible to subscribers to the relevant volume
-
-
Explorations of morphological structure in distributional space
Author(s): Harald Baayen, Dunstan Brown and Yu-Ying ChuangAvailable online: 12 September 2023More Less
-
-
-
Native and foreign language orthotactic probability and neighborhood density in word learning
Author(s): Josh Ring, Frank Leoné and Ton DijkstraAvailable online: 17 August 2023More LessAbstractLaboratory studies on word learning in a foreign language (L2) have identified several variables involved in the learning process, key amongst them the orthotactic probability and neighborhood density of new words relative to learners’ native (L1) lexicons. More recently, learners’ sensitivity to orthotactic probability and neighborhood density relative to their developing L2 lexicons has come into focus. Past studies on word learning have largely focused on early stages of learning, in controlled studies spanning hours or days. Few studies have considered large corpora of ‘real-life’ learning data, spanning several weeks. In this study, we validate past findings outside of controlled laboratory conditions, by analyzing a dataset collected from Duolingo ( Settles et al., 2018 ), a popular language learning app. Effects of orthotactic probability and neighborhood density observed in controlled studies persist under uncontrolled, big-data conditions for learners of Spanish, but not French. As learning progresses, we observe a previously unreported reversal of the effects of L1 orthotactic probability and neighborhood density, challenging theoretical models of word learning. Finally, we confirm the importance of orthotactic probability and neighborhood density relative to learners’ developing L2 Spanish lexicons, lending support to theories which posit that the same processes underly both L1 and L2 acquisition.
-
-
-
Making sense of spoken plurals
Author(s): Elnaz Shafaei-Bajestan, Peter Uhrig and R. Harald BaayenAvailable online: 17 August 2023More LessAbstractDistributional semantics offers new ways to study the semantics of morphology. This study focuses on the semantics of noun singulars and their plural inflectional variants in English. Our goal is to compare two models for the conceptualization of plurality. One model (FRACSS) proposes that all singular-plural pairs should be taken into account when predicting plural semantics from singular semantics. The other model (CCA) argues that conceptualization for plurality depends primarily on the semantic class of the base word. We compare the two models on the basis of how well the speech signal of plural tokens in a large corpus of spoken American English aligns with the semantic vectors predicted by the two models. Two measures are employed: the performance of a form-to-meaning mapping and the correlations between form distances and meaning distances. Results converge on a superior alignment for CCA. Our results suggest that usage-based approaches to pluralization in which a given word’s own semantic neighborhood is given priority outperform theories according to which pluralization is conceptualized as a process building on high-level abstraction. We see that what has often been conceived of as a highly abstract concept, [+plural], is better captured via a family of mid-level partial generalizations.
-
-
-
The cognate continuum
Author(s): Iris M. Strangmann, Katarina Antolovic, Pernille Hansen and Hanne Gram SimonsenAvailable online: 24 July 2023More LessAbstractCognates, words that are similar in form and meaning across two languages, form compelling test cases for bilingual access and representation. Overwhelmingly, cognate pairs are subjectively selected in a categorical either- or manner, often with criteria and modality unspecified. Yet the few studies that take a more nuanced approach, selecting cognate pairs along a continuum of overlap, show interesting, albeit somewhat divergent results. This study compares three measures that quantify cognateness continuously to obtain modality-specific cognate scores for the same set of Norwegian-English word-translation pairs: (1) Researcher Intuitions – bilingual researchers rate the degree of overlap between the paired words, (2) Levenshtein Distance – an algorithm that computes overlap between word pairs, and (3) Translation Elicitation – English-speaking monolinguals guess what Norwegian words mean. Results demonstrate that cognateness can be ranked on a continuum and reveal measure and modality-specific effects. Orthographic presentation yields higher cognateness status than auditory presentation overall. Though all three measures intercorrelated moderately to highly, Researcher Intuitions demonstrated a bimodal distribution, yielding scores on the high and low end of the spectrum, consistent with the common categorical approach in the field. Levenshtein Distance would be preferred for fine-grained distinctions along the continuum of form overlap.
-
-
-
Paradigm gaps are associated with weird “distributional semantics” properties
Author(s): Yu-Ying Chuang, Dunstan Brown, Harald Baayen and Roger EvansAvailable online: 30 May 2023More LessAbstractThis study investigates the phenomenon of defectiveness in Russian case and number noun paradigms from the perspective of distributional semantics. We made use of word embeddings, high-dimensional vectors trained from large text corpora, and compared the observed paradigms of nouns that are defective in the genitive plural, as suggested by Zaliznjak (1977) , with the observed paradigms for non-defective nouns. When the embeddings of about 20,000 inflected forms were projected onto a two-dimensional space, clusters of case and number within case were found, suggesting global semantic similarity for words with the same inflectional features. Moreover, defective lexemes were characterized by lower semantic transparency, in that inflected forms of the same lexeme are semantically less similar to each other, and their meanings are also more idiosyncratic. Furthermore, compared to non-defective lexemes, inflected forms from defective lexemes are further away from the idealized average case-number meanings, obtained by averaging over the vectors of all inflected forms of the same case-number combination. As a consequence, the semantics of defective forms are predicted less precisely by a simple model of conceptualization that assumes that the meaning of a given Russian inflected form is approximated well by the sum of pertinent embeddings of the lexeme, case, and number within case. We conclude that the relationship between defectiveness and semantics, at least the kind captured by word embeddings, is stronger than has been anticipated previously.
-
-
-
Regular polysemy and novel word-sense identification
Author(s): Alizée Lombard, Richard Huyghe, Lucie Barque and Doriane GrasAvailable online: 30 March 2023More Less
-
-
-
Productivity and semantic transparency
Author(s): Shen Tian and Harald BaayenAvailable online: 28 March 2023More LessAbstractWe used word embeddings to study the relation between productivity and semantic transparency. We compiled a dataset with around 2700 two-syllable compounds that shared position-specific constituents (henceforth pivots) and some 1100 suffixed words. For each pivot and suffix, we calculated measures of productivity as well as measures of semantic transparency. For compounds, productivity (P) was negatively correlated with the number of types (V) and with the semantic similarity between non-pivot constituents and their compounds. Conversely, the greater semantic similarity of the pivot with either the compound or the non-pivot constituent predicted higher degrees of productivity. Visualization with t-SNE revealed clustering of suffixed words’ embeddings, but no by-pivot clustering for compounds, except for a minority of pivots whose regions in semantic space did not contain intruding unrelated compounds. A subset of these pivots was found to realize a fixed shift in semantic space from the base word to the corresponding compound, a property that also emerged for several suffixes. For these pivots, no correlation between P and V was present. Thus, Mandarin compounds appear to realize, at one extreme, motivated but unsystematic concept formation (where other pivots could just as well have been used), and at the other extreme, systematic suffix-like semantics.
-
-
-
An inquiry into the semantic transparency and productivity of German particle verbs and derivational affixation
Author(s): Inna V. Stupak and R. Harald BaayenAvailable online: 23 March 2023More LessAbstractThis study addresses the relation between morphological productivity and semantic transparency. Using distributional semantics, we compare German word formation using particles with derivational word formation. We observed that derivational suffixes, but not particles, tend to make strong independent semantic contributions to their carrier words. In two-dimensional t-SNE maps, complex words show clustering by affix, but not by particle. Furthermore, the semantic vectors of suffixed words are predictable from their base words with higher accuracy than is possible for particle verbs. For particle verbs, but not affixed verbs, semantic similarity within the set of complex words correlated negatively with the number of types. Furthermore, only for particle verbs, a greater number of observed types predicted a reduced probability of observing unseen types. We propose that particle verbs primarily serve the onomasiological function of labeling, resulting in relatively idiosyncratic semantic vectors. By contrast, words sharing derivational affixes form distinct clusters in semantic space while maintaining strong and consistent semantic relations with their base words. This enables these words to serve not only as labels, but also allows them to be used with an anaphoric function in discourse.
-
-
-
A generating model for Finnish nominal inflection using distributional semantics
Author(s): Alexandre Nikolaev, Yu-Ying Chuang and R. Harald BaayenAvailable online: 17 March 2023More LessAbstractFinnish nouns are characterized by rich inflectional variation, with obligatory marking of case and number, with optional possessive suffixes and with the possibility of further cliticization. We present a model for the conceptualization of Finnish inflected nouns, using pre-compiled fasttext embeddings (300-dimensional semantic vectors that approximate words’ meanings). Instead of deriving the semantic vector of an inflected word from another word in its paradigm, we propose that an inflected word is conceptualized by means of summation of latent vectors representing the meanings of its lexeme and its inflectional features. We tested this model on the 2,000 most frequent Finnish nouns and their inflected word forms from a corpus of Finnish (84 million tokens). Visualization of the semantic space of Finnish using t-SNE clarified that a ‘main effects’ additive model does not do justice to the semantics of inflection. In Finnish, how number is realized turns out to vary substantially with case. Further interactions emerged with the possessive suffixes and the clitics. By taking these interactions into account, the accuracy of our model, evaluated with the fasttext embeddings as gold standard, improved from 76% to 89%. Analyses of the errors made by the model clarified that 7.5% of errors are due to overabundance (and hence not true errors), and that 16.5% of the errors involved exchanges of semantically highly similar stems (lexemes). Our results indicate, first, that the semantics of Finnish noun inflection are more intricate than assumed thus far, and second, that these intricacies can be captured with surprisingly high accuracy by a simple generating model based on imputed semantic vectors for lexemes, inflectional features, and interactions of inflectional features.
-
-
-
The time-course of contextual modulation for underspecified meaning
Author(s): Yao-Ying Lai, David Braze and Maria Mercedes PiñangoAvailable online: 23 February 2023More LessAbstractSentences like (1) “The singer began the album” are ambiguous between an agentive reading (The singer began recording/playing/etc. the album) and a constitutive reading (The singer’s song was the first track). The ambiguity is rooted in the meaning specification of the aspectual-verb class, which demands its complement be construed as a structured individual along a dimension (e.g., spatial, informational, eventive). In (1), the complement can be construed as a set of eventualities (eventive) or musical content (informational). Processing aspectual-verb sentences is shown to involve (a) exhaustive lexical-function retrieval and (b) construal of multiple dimension-specific structured individuals, leading to multiple compositions with agentive and constitutive readings. The ultimate interpretation depends on the biased dimensions in context. Our eye-tracking study comparing sentences in different contexts (agentive vs. constitutive-biasing) shows not only the aspectual-verb composition effect, previously reported for the agentive readings, but also a comparable processing profile for the constitutive readings, a novel finding supporting the unified linguistic analysis and processing implementation of the two readings. Regardless of reading, the composition effect is observable even after the complement has been retrieved, indicating that the fundamental lexico-semantic compositional processes must take place before context can serve as a constraining force.
-