Volume 27, Issue 4
  • ISSN 1384-6655
  • E-ISSN: 1569-9811



This article introduces Corpus PalaeoHibernicum (CorPH), a corpus currently consisting of 78 texts in Early Irish (c. 7th–10th cent.) created by the ERC-funded () project by bringing together pre-existing lexical and syntactic databases and adding further crucial texts from the period. In addition to being annotated for POS, morphological and syntactic information, another layer of annotation has been developed for CorPH – ‘Variation Tagging’, i.e. a tagset that numerically encodes synchronic language variation during the Early Irish period, thus allowing for much improved research on the chronological variation among the material. Another new pillar of studying linguistic variation is Bayesian Language Variation Analysis (BLaVA), in order to address the challenge that “not-so-big data” poses to statistical corpus methods. Instead of reflecting feature frequencies, BLaVA models language variation as probabilities of variation.

Available under the CC BY 4.0 license.

Article metrics loading...

Loading full text...

Full text loading...



  1. Atkinson, R.
    (1887) The Passions and the Homilies from Leabhar Breac. Royal Irish Academy.
    [Google Scholar]
  2. Barrett, S.
    (2017) A Study of the Lexicon of the Poems of Blathmac Son of Cú Brettan. [Doctoral dissertation, Maynooth University]. MURAL – Maynooth University Research Archive Library. https://mural.maynoothuniversity.ie/10042/
    [Google Scholar]
  3. Bauer, B.
    (2015) The online database of the Old Irish Priscian Glosses. www.univie.ac.at/indogermanistik/priscian/
    [Google Scholar]
  4. (in preparation). Corpus Palaeohibernicum (CorPH): From an Early Irish lexical database to a text-based corpus using Python.
    [Google Scholar]
  5. Bauer, B., Hofman, R., & Moran, P.
    (2017) St Gall Priscian Glosses (Version 2.0). www.stgallpriscian.ie
    [Google Scholar]
  6. Bronner, D.
    (2013) Verzeichnis altirischer Quellen [Directory of Old Irish Sources]. Philipps Universität Marburg.
    [Google Scholar]
  7. Claris International Inc.
    Claris International Inc. (2006–15) FileMaker Pro 8–14. [Computer Software]. https://www.claris.com/filemaker/
    [Google Scholar]
  8. Crystal, D.
    (2008) A Dictionary of Linguistics and Phonetics. (6th ed.). Blackwell. 10.1002/9781444302776
    https://doi.org/10.1002/9781444302776 [Google Scholar]
  9. Dublin Institute for Advanced Studies
    Dublin Institute for Advanced Studies (2004–) Irish Script on Screen. https://www.isos.dias.ie/
    [Google Scholar]
  10. Evert, S.
    (2008) Corpora and collocations. InA. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp.1212–1248). Mouton de Gruyter.
    [Google Scholar]
  11. Färber, B.
    (2012–) CELT: Corpus of Electronic Texts. celt.ucc.ie/
    [Google Scholar]
  12. Farr, F., & O’Keeffe, A.
    (2002) Would as a hedging device in an Irish context: An intra-varietal comparison of institutionalised spoken interaction. InS. M. Fitzmaurice, D. Biber, & R. Reppen (Eds.), Using Corpora to Explore Linguistic Variation (pp.25–48). John Benjamins. 10.1075/scl.9.04far
    https://doi.org/10.1075/scl.9.04far [Google Scholar]
  13. Gries, S. Th., & Hilpert, M.
    (2010) Modeling diachronic change in the third person singular: A multifactorial, verb- and author-specific exploratory approach. English Language and Linguistics, 14(3), 293–320. 10.1017/S1360674310000092
    https://doi.org/10.1017/S1360674310000092 [Google Scholar]
  14. Griffith, A., & Stifter, D.
    (2013) Dictionary and Database of the Old Irish Glosses in the Milan MS Ambr. C301 inf. https://indogermanistik.univie.ac.at/milan-glosses/
    [Google Scholar]
  15. Griffith, A., Stifter, D., & Toner, G.
    (2018) Early Irish lexicography – A research survey. Kratylos, 63, 1–28. 10.29091/KRATYLOS/2018/1/1
    https://doi.org/10.29091/KRATYLOS/2018/1/1 [Google Scholar]
  16. Haspelmath, A.
    (2020) The morph as a minimal linguistic form. Morphology, 30, 117–134. 10.1007/s11525‑020‑09355‑5
    https://doi.org/10.1007/s11525-020-09355-5 [Google Scholar]
  17. Hellwig, O.
    (2019) Dating Sanskrit texts using linguistic features and neural networks. Indogermanische Forschungen, 124, 1–47. 10.1515/if‑2019‑0001
    https://doi.org/10.1515/if-2019-0001 [Google Scholar]
  18. (2020) Dating and stratifying a historical corpus with a Bayesian mixture model. InR. Sprugnoli & M. Passarotti (Eds.), Proceedings of the LREC 2020 1st Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2020) (pp.1–10). European Language Resources Association. https://aclanthology.org/2020.lt4hala-1.1.pdf
    [Google Scholar]
  19. Hemprich, G.
    (in preparation). Catalogue of Medieval Irish Literature.
    [Google Scholar]
  20. Hilpert, M., & Gries, S. Th.
    (2016) Quantitative approaches to diachronic corpus linguistics. InM. Kytö & P. Pahta (Eds.), The Cambridge Handbook of English Historical Linguistics (pp.36–53). Cambridge University Press. 10.1017/CBO9781139600231.003
    https://doi.org/10.1017/CBO9781139600231.003 [Google Scholar]
  21. Hundt, M.
    (2004) Animacy, agentivity, and the spread of the progressive in Modern English. English Language & Linguistics, 8(1), 47–69. 10.1017/S1360674304001248
    https://doi.org/10.1017/S1360674304001248 [Google Scholar]
  22. Kavanagh, S.
    (2001) A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul (D. S. Wodtko, Ed.). Österreichische Akademie der Wissenschaften.
    [Google Scholar]
  23. Kelly, P., & Fogarty, H.
    (2006–2011) Thesaurus Linguae Hibernicae. https://www.ucd.ie/tlh/index.html
    [Google Scholar]
  24. Lash, E.
    (2014) The Parsed Old and Middle Irish Corpus (POMIC) (version 0.1). https://www.dias.ie/celt/celtpublications-2/celt-the-parsed-old-and-middle-irish-corpus-pomic/
    [Google Scholar]
  25. Lash, E., Qiu, F., & Stifter, D.
    (2020) Introduction: Celtic studies and corpus linguistics. InE. Lash, F. Qiu, & D. Stifter (Eds.), Morphosyntactic Variation in Medieval Celtic Languages: Corpus-based Approaches (pp.1–12). De Gruyter Mouton. 10.1515/9783110680744‑001
    https://doi.org/10.1515/9783110680744-001 [Google Scholar]
  26. Lehmann, H. M., & Schneider, G.
    (2012) Syntactic variation and lexical preference in the dative-shift alternation. InJ. Mukherjee & M. Huber (Eds.), Corpus Linguistics and Variation in English: Theory and Description (pp.65–75). Rodopi.
    [Google Scholar]
  27. McCone, K.
    (1996) Towards a Relative Chronology of Ancient and Medieval Celtic Sound Change. Maynooth.
    [Google Scholar]
  28. (1997) The Early Irish Verb (Rev. 2nd ed. with index verborum.). An Sagart.
    [Google Scholar]
  29. Ó Corráin, D.
    (2017) Clavis Litterarum Hibernensium: Medieval Irish Books & Texts (c. 400 – c. 1600) (Vol.1–3). Brepols.
    [Google Scholar]
  30. Qiu, F., & Stifter, D.
    (2020) Chronologicon Hibernicum: Frámaíocht dhóchúlaíoch chun dátú a dhéanamh ar fhorbairtí i dteanga na Sean-Ghaeilge [Chronologicon Hibernicum: A probabilistic framework for the dating of Old Irish language developments]. InE. Ó Raghallaigh (Ed.), Téamaí agus Tionscadail Taighde (pp.39–59). An Sagart.
    [Google Scholar]
  31. Qiu, F., Stifter, D., Bauer, B., Lash, E., & Tianbo, J.
    (2018) Chronologicon Hibernicum: A probabilistic chronological framework for dating Early Irish language developments and literature. InM. Ioannides (Eds.), Digital Heritage: Progress in Cultural Heritage: Documentation, Preservation, and Protection (pp.731–740). Springer. 10.1007/978‑3‑030‑01762‑0_65
    https://doi.org/10.1007/978-3-030-01762-0_65 [Google Scholar]
  32. R Core Team
    R Core Team (2020) R: A Language and Environment for Statistical Computing (Version 4.0.0) [Computer Software]. R Foundation for Statistical Computing. https://www.R-project.org/
    [Google Scholar]
  33. Rögnvaldsson, E., & Helgadóttir, S.
    (2011) Morphosyntactic tagging of Old Icelandic texts and its use in studying syntactic variation and change. InC. Sporleder, A. Bosch, & K. Zervanou (Eds.), Language Technology for Cultural Heritage (pp.63–76). Springer. 10.1007/978‑3‑642‑20227‑8_4
    https://doi.org/10.1007/978-3-642-20227-8_4 [Google Scholar]
  34. Sagart, L., Jacques, G., Lai, Y., Ryder, R. J., Thouzeau, V., Greenhill, S. J., & List, J.
    (2019) Dated language phylogenies shed light on the ancestry of Sino-Tibetan. Proceedings of the National Academy of Sciences of the USA116(21), 10317–10322. 10.1073/pnas.1817972116
    https://doi.org/10.1073/pnas.1817972116 [Google Scholar]
  35. Schneider, G.
    (2008) Hybrid Long-Distance Functional Dependency Parsing [Doctoral dissertation, University of Zurich]. https://www.cl.uzh.ch/dam/jcr:ffffffff-c155-5f61-0000-00004dc66d11/schneider_diss.pdf
    [Google Scholar]
  36. Schreier, D.
    (2005) #CCV- > #CV-: Corpus-based evidence of historical change in English phonotactics. International Journal of English Studies, 5(1), 77–99.
    [Google Scholar]
  37. Schumacher, S.
    (2004) Die keltischen Primärverben: Ein vergleichendes, etymologisches und morpho-logisches Lexikon [The Celtic Primary Verbs: A Comparative, Etymological and Morphological Dictionary]. Innsbruck.
    [Google Scholar]
  38. Stifter, D.
    (2009) Early Irish. InM. Ball & N. Müller (Eds.), The Celtic Languages (2nd ed., pp.55–116). Routledge.
    [Google Scholar]
  39. Stifter, D., Barrett, S., Bauer, B., Ganly, E., Griffith, A., Ji, T., Lash, E., Nguyen, T. H., Osarobo, G., Qiu, F., & White, N.
    (2021–) Corpus Palaeohibernicum. https://chronhib.maynoothuniversity.ie/chronhibWebsite/
    [Google Scholar]
  40. Stokes, W., & Strachan, J.
    (Eds.) (1901–1910) Thesaurus Palaeohibernicus: A Collection of Old Irish Glosses, Scholia, Prose and Verse. Dublin Institute for Advanced Studies.
    [Google Scholar]
  41. Su, Y.-S., & Yajima, M.
    (2020) R2jags: Using R to Run ‘JAGS’ (Version 0.6–1). https://CRAN.R-project.org/package=R2jags
    [Google Scholar]
  42. Rama, T., & Wichmann, S.
    (2020) A test of generalized Bayesian dating: A new linguistic dating method. PLOS ONE15(8): e0236522. 10.1371/journal.pone.0236522
    https://doi.org/10.1371/journal.pone.0236522 [Google Scholar]
  43. Thurneysen, R.
    (1946) A Grammar of Old Irish. The Dublin Institute for Advanced Studies.
    [Google Scholar]
  44. Toner, G., & Han, X.
    (2019) Language and Chronology: Text Dating by Machine Learning. Brill. 10.1163/9789004410046
    https://doi.org/10.1163/9789004410046 [Google Scholar]
  45. Uhlich, J.
    (2018) Review article of: P. Ó Riain (ed.), The Poems of Blathmac Son of Cú Brettan: Reassessments. Irish Texts Society, 2015. Cambrian Medieval Celtic Studies, 75, 53–77.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error