1887
image of Natural Language Processing for Ancient Greek

Abstract

Abstract

Computational methods have produced meaningful and usable results to study word semantics, including semantic change. These methods, belonging to the field of Natural Language Processing, have recently been applied to ancient languages; in particular, language modelling has been applied to Ancient Greek, the language on which we focus. In this contribution we explain how vector representations can be computed from word co-occurrences in a corpus and can be used to locate words in a semantic space, and what kind of semantic information can be extracted from language models. We compare three different kinds of language models that can be used to study Ancient Greek semantics: a count-based model, a word embedding model and a syntactic embedding model; and we show examples of how the quality of their representations can be assessed. We highlight the advantages and potential of these methods, especially for the study of semantic change, together with their limitations.

Available under the CC BY 4.0 license.
Loading

Article metrics loading...

/content/journals/10.1075/dia.23013.sto
2024-07-02
2024-07-20
Loading full text...

Full text loading...

/deliver/fulltext/10.1075/dia.23013.sto/dia.23013.sto.html?itemId=/content/journals/10.1075/dia.23013.sto&mimeType=html&fmt=ahah

References

  1. Al-Ghezi, Ragheb & Mikko Kurimo
    2020 Graph-based syntactic word embeddings. InDmitry Ustalov, Swapna Somasundaran, Alexander Panchenko, Fragkiskos D. Malliaros, Ioana Hulpuș, Peter Jansen & Abhik Jana (eds.), Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs), –. 10.18653/v1/2020.textgraphs‑1.8
    https://doi.org/10.18653/v1/2020.textgraphs-1.8 [Google Scholar]
  2. Bamman, David & Gregory Crane
    2011 The Ancient Greek and Latin dependency treebanks. InCaroline Sporleder, Antal van den Bosch & Kalliopi Zervanou (eds.), Language technology for cultural heritage: Selected papers from the LaTeCH [Language Technology for Cultural Heritage] workshop series (Theory and Applications of Natural Language Processing), –. Berlin & Heidelberg: Springer. 10.1007/978‑3‑642‑20227‑8_5
    https://doi.org/10.1007/978-3-642-20227-8_5 [Google Scholar]
  3. Bianchi, Federico, Valerio Di Carlo, Paolo Nicoli & Matteo Palmonari
    2020 Compass-aligned distributional embeddings for studying semantic differences across corpora. ArXiv. https://arxiv.org/abs/2004.06519. (24 August, 2023.)
    [Google Scholar]
  4. Boschetti, Federico
    2009A corpus-based approach to philological issues. Trento, Italy: University of Trento thesis.
    [Google Scholar]
  5. Boschetti, Federico, Riccardo Del Gratta & Harry Diakoff
    2016 Open Ancient Greek WordNet 0.5’. Pisa: ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics “A. Zampolli”, National Research Council, in Pisa. https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-56. (24 August, 2023.)
  6. Di Carlo, Valerio, Federico Bianchi & Matteo Palmonari
    2019 Training temporal word embeddings with a compass. AAAI-19 [Association for the Advancement of Artificial Intelligence] Conference on Artificial Intelligence, (). –. 10.1609/aaai.v33i01.33016326
    https://doi.org/10.1609/aaai.v33i01.33016326 [Google Scholar]
  7. Gorman, Vanessa B.
    2020 Dependency treebanks of Ancient Greek prose. Journal of Open Humanities Data(). 10.5334/johd.13
    https://doi.org/10.5334/johd.13 [Google Scholar]
  8. Grover, Aditya & Jure Leskovec
    2016 Node2vec: Scalable feature learning for networks. InBalaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu Aggarwal, Dou Shen & Rajeev Rastogi (eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), –. 10.1145/2939672.2939754
    https://doi.org/10.1145/2939672.2939754 [Google Scholar]
  9. Gulordava, Kristina & Marco Baroni
    2011 A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. InSebastian Pado & Yves Peirsman (eds.), Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, –.
    [Google Scholar]
  10. Hamilton, William L., Jure Leskovec & Dan Jurafsky
    2016 Diachronic word embeddings reveal statistical laws of semantic change. InKatrin Erk & Noah A. Smith (eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics [ACL], –. Berlin: Association for Computational Linguistics. 10.18653/v1/P16‑1141
    https://doi.org/10.18653/v1/P16-1141 [Google Scholar]
  11. Harris, Zellig S.
    1954 Distributional structure. Word(). –. 10.1080/00437956.1954.11659520
    https://doi.org/10.1080/00437956.1954.11659520 [Google Scholar]
  12. Haug, Dag T. T. & Marius L. Jøhndal
    2008 Creating a parallel treebank of the Old Indo-European Bible translations. InCaroline Sporleder, Antal van den Bosch & Claire Grover (eds.), Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH), –.
    [Google Scholar]
  13. Kaiser, Jens, Sinan Kurtyigit, Serge Kotchourko & Dominik Schlechtweg
    2021 Effects of pre- and post-processing on type-based embeddings in lexical semantic change detection. InPaola Merlo, Jorg Tiedemann & Reut Tsarfaty (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics [EACL], –. 10.18653/v1/2021.eacl‑main.10
    https://doi.org/10.18653/v1/2021.eacl-main.10 [Google Scholar]
  14. Keersmaekers, Alek, Wouter Mercelis, Colin Swaelens & Toon Van Hal
    2019 Creating, enriching and valorizing treebanks of Ancient Greek. InMarie Candito, Kilian Evang, Stephan Oepen & Djamé Seddah (eds.), Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019), –. 10.18653/v1/W19‑7812
    https://doi.org/10.18653/v1/W19-7812 [Google Scholar]
  15. Kozlowski, Austin C., Matt Taddy & James A. Evans
    2019 The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review(). –. 10.1177/0003122419877135
    https://doi.org/10.1177/0003122419877135 [Google Scholar]
  16. Kulkarni, Vivek, Rami Al-Rfou, Bryan Perozzi & Steven Skiena
    2015 Statistically significant detection of linguistic change. InAldo Gangemi, Stefano Leonardi & Alessandro Panconesi (eds.), WWW ’15: Proceedings of the 24th International World Wide Web Conference, –. New York: Association for Computing Machinery. 10.1145/2736277.2741627
    https://doi.org/10.1145/2736277.2741627 [Google Scholar]
  17. Lenci, Alessandro & Magnus Sahlgren
    2023Distributional semantics (Studies in Natural Language Processing). Cambridge: Cambridge University Press. 10.1017/9780511783692
    https://doi.org/10.1017/9780511783692 [Google Scholar]
  18. Levy, Omer & Yoav Goldberg
    2014 Dependency-based word embeddings. InKristina Toutanova & Hua Wu (eds.), Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), –. Baltimore: Association for Computational Linguistics. 10.3115/v1/P14‑2050
    https://doi.org/10.3115/v1/P14-2050 [Google Scholar]
  19. McGillivray, Barbara
    2014Methods in Latin computational linguistics. Leiden: Brill. 10.1163/9789004260122
    https://doi.org/10.1163/9789004260122 [Google Scholar]
  20. 2022How to use word embeddings for Natural Language Processing. SAGE Publications Ltd. 10.4135/9781529609578 (24 August, 2023.)
    https://doi.org/10.4135/9781529609578 [Google Scholar]
  21. Mikolov, Tomas, Kai Chen, Greg Corrado & Jeffrey Dean
    2013 Efficient estimation of word representations in vector space. ArXiv. https://arxiv.org/abs/1301.3781. (24 August, 2023.)
    [Google Scholar]
  22. Perrone, Valerio, Marco Palma, Simon Hengchen, Alessandro Vatri, Jim Q. Smith & Barbara McGillivray
    2019 GASC: Genre-aware semantic change for Ancient Greek. InNina Tahmasebi, Lars Borin, Adam Jatowt & Yang Xu (eds.), Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, –. 10.18653/v1/W19‑4707
    https://doi.org/10.18653/v1/W19-4707 [Google Scholar]
  23. Perrone, Valerio, Simon Hengchen, Marco Palma, Alessandro Vatri, Jim Q. Smith & Barbara McGillivray
    2021 Lexical semantic change for Ancient Greek and Latin. InTahmasebi, Nina, Lars Borin, Adam Jatowt, Yang Xu & Simon Hengchen (eds.), Computational approaches to semantic change (Language Variation 6), –. Berlin: Language Science Press.
    [Google Scholar]
  24. Rodda, Martina A., Marco S. G. Senaldi & Alessandro Lenci
    2017Panta rei: Tracking semantic change with distributional semantics in Ancient Greek. Italian Journal of Computational Linguistics(). –. 10.4000/ijcol.421
    https://doi.org/10.4000/ijcol.421 [Google Scholar]
  25. Rodda, Martina A., Philomen Probert & Barbara McGillivray
    2019 Vector space models of Ancient Greek word meaning, and a case study on Homer. TAL Traitement Automatique des Langues(). –.
    [Google Scholar]
  26. Sandhan, Jivnesh, Om Adideva Paranjay, Komal Digumarthi, Laxmidhar Behra & Pawan Goyal
    2023 Evaluating neural word embeddings for Sanskrit. InAmba Kulkarni & Oliver Hellwig (eds.), Proceedings of the Computational Sanskrit & Digital Humanities: Selected papers presented at the 18th World Sanskrit Conference, –. Canberra: Association for Computational Linguistics.
    [Google Scholar]
  27. Sprugnoli, Rachele, Giovanni Moretti & Marco Passarotti
    2020 Building and comparing lemma embeddings for Latin: Classical Latin versus Thomas Aquinas. IJCoL. Italian Journal of Computational Linguistics(). –. 10.4000/ijcol.624
    https://doi.org/10.4000/ijcol.624 [Google Scholar]
  28. Stopponi, Silvia, Saskia Peels-Matthey & Malvina Nissim
    2024 AGREE: A new benchmark for the evaluation of distributional semantic models of Ancient Greek. Digital Scholarship in the Humanities. 10.1093/llc/fqad087 (26 January, 2024.)
    https://doi.org/10.1093/llc/fqad087 [Google Scholar]
  29. Vatri, Alessandro & Barbara McGillivray
    2018 The Diorisis Ancient Greek corpus. Research Data Journal for the Humanities and Social Sciences(). –. 10.1163/24523666‑01000013
    https://doi.org/10.1163/24523666-01000013 [Google Scholar]
  30. Vierros, Marja & Erik Henriksson
    2021 PapyGreek treebanks: A dataset of linguistically annotated Greek documentary papyri. Journal of Open Humanities Data. 10.5334/johd.55
    https://doi.org/10.5334/johd.55 [Google Scholar]
  31. Tognini-Bonelli, Elena
    2001Corpus linguistics at work. Amsterdam: John Benjamins. 10.1075/scl.6
    https://doi.org/10.1075/scl.6 [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/dia.23013.sto
Loading
/content/journals/10.1075/dia.23013.sto
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error