
Full text loading...
Abstract
This study uses word embeddings to investigate the semantic changes underlying the creation of two adversative connectives in Portuguese, porém and mas ‘but, however’. For porém, we chart its development from an original PP formed by a preposition with a causal meaning (por) and a demonstrative pronoun that referred anaphorically to a previous proposition (en(de)). For mas, we trace its change from an adverb meaning ‘more’. Adopting a distributional semantics approach, we use word embedding models trained on two corpora, the CIPM (Corpus Informatizado do Português Medieval, containing texts from the 12th–16th centuries) and COLONIA (containing texts from the 16th–20th centuries). We produce a measure of change based on the similarity scores of porém and mas with respect to words in relevant semantic categories in each corpus, representing the source and the target meanings. This paper, which constitutes the first computational study of semantic change in Portuguese, also discusses challenges and outlines steps to be taken into consideration when choosing embedding algorithms for small historical corpora.
Article metrics loading...
Full text loading...
References
Data & Media loading...