Quantifying lexical and pronunciation variation between three Arabic varieties*
This paper reports on computational measures of linguistic variation that quantify the lexical and pronunciation variation between three varieties of Arabic, Moroccan Arabic, Egyptian Arabic, and Gulf Arabic. We provide three measures of linguistic variation; all computed based on elicitation of the Swadesh list. The first measure is the lexical variation based on the percentage of noncognate words. The second is another lexical measure that takes into account a pronunciation aspect by considering the IPA transcription of the same word list. The third is a pronunciation measure that computes the variation of the IPA transcription of the cognate words in the Swadesh list. The results of the three measures show that geographically proximate languages are also linguistically closer to each other.