Volume 10, Issue 1
  • ISSN 2210-2116
  • E-ISSN: 2210-2124
Buy:$35.00 + Taxes



This paper uses a novel data-driven probabilistic approach to address the century-old Inner-Outer hypothesis of Indo-Aryan. I develop a Bayesian hierarchical mixed-membership model to assess the validity of this hypothesis using a large data set of automatically extracted sound changes operating between Old Indo-Aryan and Modern Indo-Aryan speech varieties. I employ different prior distributions in order to model sound change, one of which, the Logistic Normal distribution, has not received much attention in linguistics outside of Natural Language Processing, despite its many attractive features. I find evidence for cohesive dialect groups that have made their imprint on contemporary Indo-Aryan languages, and find that when a Logistic Normal prior is used, the distribution of dialect components across languages is largely compatible with a core-periphery pattern similar to that proposed under the Inner-Outer hypothesis.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aitchison, John
    1986The Statistical Analysis of Compositional Data. London & New York: Chapman & Hall. 10.1007/978‑94‑009‑4109‑0
    https://doi.org/10.1007/978-94-009-4109-0 [Google Scholar]
  2. Berger, Hermann
    1955Zwei Probleme der mittelindischen Lautlehre. Munich: J. Kitzinger.
    [Google Scholar]
  3. Blei, David M., Alp Kucukelbir & Jon D. McAuliffe
    2017 Variational Inference: A Review for Statisticians. Journal of the American Statistical Association112:518.859–877. 10.1080/01621459.2017.1285773
    https://doi.org/10.1080/01621459.2017.1285773 [Google Scholar]
  4. Blei, David M. & John D. Lafferty
    2007 A Correlated Topic Model of Science. The Annals of Applied Statistics1:1.17–35. 10.1214/07‑AOAS114
    https://doi.org/10.1214/07-AOAS114 [Google Scholar]
  5. Blei, David M., Andrew Y. Ng & Michael I. Jordan
    2003 Latent Dirichlet Allocation. Journal of Machine Learning Research3.993–1022.
    [Google Scholar]
  6. Bloomfield, Leonard
    1933Language. New York: Holt, Rinehart & Winston.
    [Google Scholar]
  7. Bouchard-Côté, Alexandre, Thomas L. Griffiths & Dan Klein
    2009 Improved Reconstruc-tion of Protolanguage Word Forms. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, 65–73. Boulder, CO: Association for Computational Linguistics.
    [Google Scholar]
  8. Bouchard-Côté, Alexandre, David Hall, Thomas L. Griffiths & Dan Klein
    2013 Auto-mated Reconstruction of Ancient Languages using Probabilistic Models of Sound Change. Proceedings of the National Academy of Sciences110.4224–4229. 10.1073/pnas.1204678110
    https://doi.org/10.1073/pnas.1204678110 [Google Scholar]
  9. Bouchard-Côté, Alexandre, Percy S. Liang, Thomas L. Griffiths & Dan Klein
    2007 A Probabilistic Approach to Diachronic Phonology. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 887–896. Prague: Association for Computational Linguistics.
    [Google Scholar]
  10. Bouchard-Côté, Alexandre, Percy S. Liang, Dan Klein & Thomas L. Griffiths
    2008 A Probabilistic Approach to Language Change. Advances in Neural Information Processing Systems, 169–176.
    [Google Scholar]
  11. Box, George E. P.
    1980 Sampling and Bayes’ Inference in Scientific Modelling and Robustness. Journal of the Royal Statistical Society. Series A (General)143.383–430. 10.2307/2982063
    https://doi.org/10.2307/2982063 [Google Scholar]
  12. Burrow, Thomas
    1975 A New Look at Brugmann’s Law. Bulletin of the School of Oriental and African Studies38:1.55–80. 10.1017/S0041977X00047030
    https://doi.org/10.1017/S0041977X00047030 [Google Scholar]
  13. Cardona, George & Dhanesh Jain
    2007 General Introduction. The Indo-Aryan Languagesed. byGeorge Cardona & Dhanesh Jain, 2–45. London: Routledge.
    [Google Scholar]
  14. Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li & Allen Riddell
    2017 Stan: A Probabilistic Programming Language. Journal of Statistical Software76. 10.18637/jss.v076.i01
    https://doi.org/10.18637/jss.v076.i01 [Google Scholar]
  15. Chang, Will & Lev Michael
    2014 A Relaxed Admixture Model of Language Contact. Language Dynamics and Change4:1.1–26. 10.1163/22105832‑00401005
    https://doi.org/10.1163/22105832-00401005 [Google Scholar]
  16. Chatterji, Suniti Kumar
    1926The Origin and Development of the Bengali Language. Calcutta: Calcutta University Press.
    [Google Scholar]
  17. Cohen, Shay B., Kevin Gimpel & Noah A. Smith
    2009 Logistic Normal Priors for Unsu-pervised Probabilistic Grammar Induction. InAdvances in Neural Information Processing Systems, 321–328.
    [Google Scholar]
  18. Cohen, Shay B. & Noah A. Smith
    2009 Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction. InProceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 74–82. Boulder, CO: Association for Computational Linguistics.
    [Google Scholar]
  19. Deo, Ashwini
    2018 Dialects in the Indo-Aryan landscape. The Handbook of Dialectologyed. byCharles Boberg, John Nerbonne & Dominic Watt, 535–546. Oxford: John Wiley & Sons.
    [Google Scholar]
  20. Elizarenkova, T. Y.
    1989 About Traces of a Prakrit Dialectal Basis in the Language of the Rgveda. Dialectes dans les littératures indo-aryennesed. byColette Caillat, 1–18. Paris: Collège de France.
    [Google Scholar]
  21. Emeneau, Murray B.
    1966 The Dialects of Old-Indo-Aryan. Ancient Indo-European dialectsed. byJaan Puhvel, 123–138. Berkeley: University of California Press.
    [Google Scholar]
  22. Frisk, Hjalmar
    1991Griechisches etymologisches Wörterbuch. Band II: Kρ–Ω. Heidelberg: Carl Winter.
    [Google Scholar]
  23. Fritz, Sonja
    2002The Dhivehi Language: a Descriptive and Historical Grammar of Maldivian and its Dialects. 2vols.Heidelberg: Ergon.
    [Google Scholar]
  24. Gelman, Andrew, Xiao-Li Meng & Hal Stern
    1996 Posterior Predictive Assessment of Model Fitness via Realized Discrepancies. Statistica Sinica6.733–760.
    [Google Scholar]
  25. Gelman, Andrew & Donald B. Rubin
    1992 Inference from Iterative Simulation Using Multiple Sequences. Statistical Science7:4.457–472. 10.1214/ss/1177011136
    https://doi.org/10.1214/ss/1177011136 [Google Scholar]
  26. Geman, Stuart & Donald Geman
    1984 Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence6.721–741. 10.1109/TPAMI.1984.4767596
    https://doi.org/10.1109/TPAMI.1984.4767596 [Google Scholar]
  27. Grierson, George A.
    1967 [1903–28]Linguistic Survey of India. Delhi: Motilal Banarsidass.
    [Google Scholar]
  28. Hammarström, Harald, Robert Forkel & Martin Haspelmath
    2017 Glottolog 3.3. Max Planck Institute for the Science of Human History. glottolog.org/accessed2017-12-13
  29. von Hinüber, Oskar
    2001Das ältere Mittelindisch im Überblick. Vienna: Verlag der Österreichischen Akademie der Wissenschaften.
    [Google Scholar]
  30. Hock, Hans Henrich
    2016 The Languages, their Histories, and their Genetic Classification. The Languages and Linguistics of South Asia: A Comprehensive Guideed. byHans Henrich Hock & Elena Bashir, 9–240. Berlin & Boston: De Gruyter.
    [Google Scholar]
  31. Hoernle, A. F. Rudolf
    1880A Comparative Grammar of the Gaudian Languages. London: Trübner.
    [Google Scholar]
  32. Jäger, Gerhard
    2013 Phylogenetic Inference from Word Lists using Weighted Alignment with Empirically Determined Weights. Language Dynamics and Change3.245–291. 10.1163/22105832‑13030204
    https://doi.org/10.1163/22105832-13030204 [Google Scholar]
  33. Jamison, Stephanie W.
    1988 The Quantity of the Outcome of Vocalized Laryngeals in Indic. Die Laryngaltheorie und die Rekonstruktion des indogermanischen Laut- und Formensystemsed. byAlfred Bammesberger, 213–226. Heidelberg: Carl Winter.
    [Google Scholar]
  34. Jeffers, Robert J.
    1976 The Position of the Bihārī Dialects in Indo-Aryan. Indo-Iranian Journal18:3–4.215–225. 10.1163/000000076790079708
    https://doi.org/10.1163/000000076790079708 [Google Scholar]
  35. Joshi, S. D.
    1989 Patañjali’s Views on Apaśabdas. Dialectes dans les littératures indo-aryennesed. byColette Caillat, 267–294. Paris: Collège de France.
    [Google Scholar]
  36. Kakati, Banikanta
    1941Assamese, its Formation and Development. Gauhati: Government of Assam.
    [Google Scholar]
  37. Kingma, Diederik P. & Jimmy Ba
    2015 Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR).
    [Google Scholar]
  38. Kingma, Diederik P. & Adam Welling
    2013 Auto-Encoding Variational Bayes. International Conference on Learning Representations (ICLR).
    [Google Scholar]
  39. Kogan, Anton I.
    2005Dardskie jazyki. Genetičeskaja xarakteristika. Moscow: Vostočnaja Literatura.
    [Google Scholar]
  40. Koskenniemi, Kimmo
    2017 Aligning Phonemes using Finite-State Methods. Proceedings of the 21st Nordic Conference of Computational Linguistics, 56–64. Gothenburg: Linköping University Electronic Press.
    [Google Scholar]
  41. Kucukelbir, Alp, Dustin Tran, Rajesh Ranganath, Andrew Gelman & David M. Blei
    2017 Automatic Differentiation Variational Inference. The Journal of Machine Learning Research18:1.430–474.
    [Google Scholar]
  42. Kuiper, Franciscus Bernardus Jacobus
    1991Aryans in the Rigveda. Amsterdam & Atlanta: Rodopi.
    [Google Scholar]
  43. Kümmel, Martin
    2015 Developments in the Dissolution of the Indo-Iranian Accentual System. Paper presented at theWorkshop on Diachronic Morphophonology: Lexical Accent Systems at the 22nd International Conference on Historical Linguistics. Naples, July 27–31.
    [Google Scholar]
  44. Lipp, Reiner
    2009Die indogermanischen und einzelsprachlichen Palatale im Indoiranischen. 2vols.Heidelberg: Carl Winter.
    [Google Scholar]
  45. List, Johann-Mattis
    2012 SCA. Phonetic Alignment based on Sound Classes. New Directions in Logic, Language, and Computationed. byM. Slavkovik & D. Lassiter, 32–51. Berlin & Heidelberg: Springer. 10.1007/978‑3‑642‑31467‑4_3
    https://doi.org/10.1007/978-3-642-31467-4_3 [Google Scholar]
  46. MacKenzie, David Neil
    1961 The Origins of Kurdish. Transactions of the Philological Society68–86. 10.1111/j.1467‑968X.1961.tb00987.x
    https://doi.org/10.1111/j.1467-968X.1961.tb00987.x [Google Scholar]
  47. Marr, David
    1982Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco: W. H. Freeman.
    [Google Scholar]
  48. Masica, Colin P.
    1991The Indo-Aryan languages. Cambridge: Cambridge University Press.
    [Google Scholar]
  49. Mayrhofer, Manfred
    1989–2001Etymologisches Wörterbuch des Altindoarischen. Heidelberg: Carl Winter.
    [Google Scholar]
  50. Meylan, Stephan, Michael Frank & Roger Levy
    2013 Modeling the Development of Deter-miner Productivity in Children’s Early Speech. Proceedings of the Annual Meeting of the Cognitive Science Society35.3032–3037.
    [Google Scholar]
  51. Meylan, Stephan C., Michael C. Frank, Brandon C. Roy & Roger Levy
    2017 The Emergence of an Abstract Grammatical Category in Children’s Early Speech. Psychological Science28:2.181–192. 10.1177/0956797616677753
    https://doi.org/10.1177/0956797616677753 [Google Scholar]
  52. Mimno, David, David M. Blei & Barbara E. Engelhardt
    2015 Posterior Predictive Checks to Quantify Lack-of-Fit in Admixture Models of Latent Population Structure. Proceedings of the National Academy of Sciences112:26.E3441–E3450. 10.1073/pnas.1412301112
    https://doi.org/10.1073/pnas.1412301112 [Google Scholar]
  53. Mimno, David, Hanna Wallach & Andrew McCallum
    2008 Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors. NIPS Workshop on Analyzing Graphs, 1–8.
    [Google Scholar]
  54. Needleman, Saul B. & Christian D. Wunsch
    1970 A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. Journal of Molecular Biology48.443–53. 10.1016/0022‑2836(70)90057‑4
    https://doi.org/10.1016/0022-2836(70)90057-4 [Google Scholar]
  55. Norton, Richard A., J. Andrés Christen & Colin Fox
    2017 Sampling Hyperparameters in Hierarchical Models: Improving on Gibbs for High-Dimensional Latent Fields and Large Datasets. Communications in Statistics-Simulation and Computation47.2639–2655. 10.1080/03610918.2017.1353618
    https://doi.org/10.1080/03610918.2017.1353618 [Google Scholar]
  56. Oberlies, Thomas
    2001Pali: A Grammar of the Language of the Theravada Tipitaka. With a Concordance to Pischel’s Grammatik der Prakrit-Sprachen. Berlin: de Gruyter.
    [Google Scholar]
  57. 2005A Historical Grammar of Hindi. Graz: Leykam.
    [Google Scholar]
  58. Parkes, Peter
    1987 Livestock Symbolism and Pastoral Ideology among the Kafirs of the Hindu Kush. Man22.637–660. 10.2307/2803356
    https://doi.org/10.2307/2803356 [Google Scholar]
  59. Parpola, Asko
    2002 Pre-Proto-Iranians of Afghanistan as Initiators of Śākta Tantrism: on the Scythian/Saka Affiliation of the Dāsas, Nuristanis and Magadhans. Iranica Antiqua37.233–324. 10.2143/IA.37.0.126
    https://doi.org/10.2143/IA.37.0.126 [Google Scholar]
  60. Peterson, John
    2017 Fitting the Pieces Together: Towards a Linguistic Prehistory of Eastern-Central South Asia (and beyond). Journal of South Asian Languages and Linguistics4.211–257. 10.1515/jsall‑2017‑0008
    https://doi.org/10.1515/jsall-2017-0008 [Google Scholar]
  61. Pischel, Richard
    1900Grammatik der Prakrit-Sprachen. Strassburg: Karl J. Trübner. 10.1515/9783111700007
    https://doi.org/10.1515/9783111700007 [Google Scholar]
  62. Pritchard, Jonathan K., Matthew Stephens & Peter Donnelly
    2000 Inference of Population Structure using Multilocus Genotype Data. Genetics155:2.945–959.
    [Google Scholar]
  63. Ranganath, Rajesh, Linpeng Tang, Laurent Charlin & David Blei
    2015 Deep Exponential Families. Proceedings of the 18th International Conference on Artificial intelligence and statistics (AISTATS), 762–771. San Diego, CA.
    [Google Scholar]
  64. Rasmussen, C. E. & C. K. I. Williams
    2006Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.
    [Google Scholar]
  65. Reesink, Ger, Ruth Singer & Michael Dunn
    2009 Explaining the Linguistic Diversity of Sahul using Population Models. PLoS Biology7.e1000241. 10.1371/journal.pbio.1000241
    https://doi.org/10.1371/journal.pbio.1000241 [Google Scholar]
  66. Rix, Helmut, Martin Kimmel, Thomas Zehnder, Reiner Lipp & Brigitte Schirmer
    eds. 2001Lexikon der indogermanischen Verben: Die Wurzeln und ihre Primärstammbildungen. 2nd ed.Wiesbaden: Ludwig Reichert.
    [Google Scholar]
  67. Salvatier, John, Thomas V. Wiecki & Christopher Fonnesbeck
    2016 Probabilistic Program-ming in Python using PyMC3. Peer J Computer Science2.e55. 10.7717/peerj‑cs.55
    https://doi.org/10.7717/peerj-cs.55 [Google Scholar]
  68. Shaked, Shaul
    1969 Notes on the New Aśoka Inscription from Kandahar. Journal of the Royal Asiatic Society101:2.118–122.
    [Google Scholar]
  69. Slaje, Walter
    2014Kingship in Kaśmīr (AD 1148–1459). Halle an der Saale: Universitätsverlag Halle-Wittenberg.
    [Google Scholar]
  70. Smith, Caley
    2017 The Dialectology of Indic. Handbook of Comparative and Historical Indo-European Linguisticsed. byJared Klein, Brian Joseph & Matthias Fritz, 417–447. Berlin & Boston: De Gruyter. 10.1515/9783110261288‑030
    https://doi.org/10.1515/9783110261288-030 [Google Scholar]
  71. Southworth, Franklin C.
    2005Linguistic Archaeology of South Asia. London: Routledge. 10.4324/9780203412916_chapter_10
    https://doi.org/10.4324/9780203412916_chapter_10 [Google Scholar]
  72. Srivastava, Akash & Charles Sutton
    2017 Autoencoding Variational Inference for Topic Models. InInternational Conference on Learning Representations (ICLR).
    [Google Scholar]
  73. Syrjänen, Kaj, Terhi Honkola, Jyri Lehtinen, Antti Leino & Outi Vesakoski
    2016 Ap-plying Population Genetic Approaches within Languages: Finnish Dialects as Linguistic Populations. Language Dynamics and Change6.235–283. 10.1163/22105832‑00602002
    https://doi.org/10.1163/22105832-00602002 [Google Scholar]
  74. Tedesco, P.
    1960 Notes to Mayrhofer’s Etymological Sanskrit Dictionary. Journal of the American Oriental Society80:4.360–366. 10.2307/595886
    https://doi.org/10.2307/595886 [Google Scholar]
  75. Tedesco, Paul
    1945 Persian čīz and Sanskrit kím. Language21.128–141. 10.2307/410504
    https://doi.org/10.2307/410504 [Google Scholar]
  76. 1965 Turner’s Comparative Dictionary of the Indo-Aryan Languages. Journal of the American Oriental Society85.368–383. 10.2307/597821
    https://doi.org/10.2307/597821 [Google Scholar]
  77. Teh, Yee Whye, Michael I. Jordan, Matthew J. Beal & David M. Blei
    2005 Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes. InAdvances in Neural Information Processing Systems, 1385–1392.
    [Google Scholar]
  78. Thiel-Horstmann, Monika
    1978 On RJ Jeffers: ‘The Position of the Bihārī Dialects in Indo-Aryan’ – A Phonological Reconsideration. Indo-Iranian Journal20:1–2.61–82. 10.1163/000000078790079940
    https://doi.org/10.1163/000000078790079940 [Google Scholar]
  79. Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy & David M. Blei
    2017 Deep Probabilistic Programming. arXiv preprint arXiv:1701.03757.
    [Google Scholar]
  80. Turner, Ralph L.
    1962–1966A Comparative Dictionary of Indo-Aryan Languages. London: Oxford University Press.
    [Google Scholar]
  81. 1916 The Indo-Germanic Accent in Marathi. The Journal of the Royal Asiatic Society of Great Britain and Ireland203–251. 10.1017/S0035869X00067319
    https://doi.org/10.1017/S0035869X00067319 [Google Scholar]
  82. Wieling, Martijn, Eliza Margaretha & John Nerbonne
    2012 Inducing a Measure of Phonetic Similarity from Pronunciation Variation. Journal of Phonetics40:2.307–314. 10.1016/j.wocn.2011.12.004
    https://doi.org/10.1016/j.wocn.2011.12.004 [Google Scholar]
  83. Williamson, Sinead, Chong Wang, Katherine A. Heller & David M. Blei
    2010 The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling. Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel.
    [Google Scholar]
  84. Witzel, Michael
    1989 Tracing the Vedic Dialects. Dialectes dans les littératures indo-aryennesed. byColette Caillat, 97–266. Paris: Collège de France.
    [Google Scholar]
  85. Yanovich, Igor
    2016 Old English *motan, Variable-Force Modality, and the Presupposition of Inevitable Actualization. Language92:3.489–521. 10.1353/lan.2016.0045
    https://doi.org/10.1353/lan.2016.0045 [Google Scholar]
  86. Zograf, G. A.
    1976Morfologičeskij stroj novyx indoarijskix jazykov. Moscow: Nauka.
    [Google Scholar]
  87. Zoller, Claus Peter
    1988 Bericht über besondere Archaismen im Bangani, einer Western Pahari-Sprache. Münchener Studien zur Sprachwissenschaft49.173–200.
    [Google Scholar]
  88. 1989 Bericht über grammatische Archaismen im Bangani. Münchener Studien zur Sprachwissenschaft50.159–218.
    [Google Scholar]
  89. 1993 A Note on Baṅgāṇi. Journal of the Linguistic Society of India54.112–114.
    [Google Scholar]
  90. 2012 Garhwali and the History of Indo-Aryan: Some Observations. Paper presented atHindi Diwas (Day of Hindi). Uppsala, 14 September.
    [Google Scholar]
  91. Zoller, Claus-Peter
    2016 Outer and Inner Indo-Aryan, and Northern India as an Ancient Linguistic Area. Acta Orientalia77.71–132.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error