1887
Volume 1 Number 2
  • ISSN 2950-1806
  • E-ISSN: 2950-1792
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This paper introduces the Hoosiers Arabic Ellipsis Corpus, a novel dataset targeting syntactic ellipsis in Arabic. Addressing the significant challenge ellipsis poses to natural language processing (NLP) technologies, the Hoosiers Arabic Ellipsis Corpus leverages the Corpus Query Language (CQL) to extract ellipsis instances from the ArTenTen corpus. To the best of our knowledge, this is the first comprehensive dataset of its kind, filling a critical gap in resources for Arabic, which remains under-resourced in NLP studies. We evaluate the corpus through three computational experiments: detecting sentences with ellipsis, predicting the location of elided elements, and generating missing words using state-of-the-art large language models (LLMs). Results demonstrate that few-shot prompting significantly improves LLM performance, with Gemini 2.5 Pro achieving the highest accuracy in ellipsis detection (95.6%). However, LLMs struggled with precisely locating and reconstructing elided elements. The findings highlight the challenges of ellipsis processing in Arabic and point to the need for larger, more balanced datasets and further refinement of NLP models to handle structural inference.

Loading

Article metrics loading...

/content/journals/10.1075/arli.00013.abd
2026-02-27
2026-03-06
Loading full text...

Full text loading...

References

  1. Abdelali, Ahmed, Darwish, Kareem, Durrani, Nadir, & Mubarak, Hamdy
    (2016) Farasa: A fast and furious segmenter for arabic. InProceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations (pp.11–16). Association for Computational Linguistics. https://aclanthology.org/N16-3003/. 10.18653/v1/N16‑3003
    https://doi.org/10.18653/v1/N16-3003 [Google Scholar]
  2. AbuOdeh, Muhammed, Phan, Long, Elshabrawy, Ahmed, & Habash, Nizar
    (2024) Palmyra 3.0: A User-Friendly Cloud-Based Platform for Morphology and Dependency Syntax Annotation. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp.12585–12591).
    [Google Scholar]
  3. Al Mana, Suaad
    (1986) Poetic necessity from the perspective of the medieval Arab critics and rhetoricians (Al-Darurah, Sibawayhi, poetic language, Qudamah Ibn Ja’far) [Doctoral dissertation, University of Michigan].
  4. Algryani, Ali
    (2019) The syntax of sluicing in Omani Arabic. International Journal of English Linguistics, 9(6), 337–346. 10.5539/ijel.v9n6p337
    https://doi.org/10.5539/ijel.v9n6p337 [Google Scholar]
  5. Alhalalmeh, Bahjat
    (2020) Nominal ellipsis in Jordanian Arabic Advertisements. Journal of the Faculty of Arts and Humanities, Suez Canal University, 3(32), 1–31. 10.21608/jfhsc.2020.204441
    https://doi.org/10.21608/jfhsc.2020.204441 [Google Scholar]
  6. Al-Horais, Nasser
    (2000) Arabic negation marker (Laysa) with bare argument ellipsis and its association with information structure. Argument, 20011, 2006–2008.
    [Google Scholar]
  7. Al-Khawalda, Mohammad
    (2002) Ellipsis in Arabic and English. International Journal of Arabic-English Studies, 3(1), 183–199. 10.33806/ijaes2000.3.1.12
    https://doi.org/10.33806/ijaes2000.3.1.12 [Google Scholar]
  8. Al-Liheibi, Fahd
    (1999) Aspects of sentence analysis in the Arabic linguistic tradition, with particular reference to ellipsis [Doctoral dissertation, Durham University].
  9. Antoun, Wissam, Baly, Fady, & Hajj, Hazem
    (2020) AraBERT: Transformer-based model for Arabic language understanding. InHend Al-Khalifa, Walid Magdy, Kareem Darwish, Tamer Elsayed, Hamdy Mubarak (Eds.), Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp.9–15). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.2/
    [Google Scholar]
  10. Arts, Tressy, Belinkov, Yonatan, Habash, Nizar, Kilgarriff, Adam, & Suchomel, Vit
    (2014) arTenTen: Arabic Corpus and Word Sketches. Journal of King Saud University-Computer and Information Sciences, 26(4), 357–371. 10.1016/j.jksuci.2014.06.009
    https://doi.org/10.1016/j.jksuci.2014.06.009 [Google Scholar]
  11. Assiri, Ahmed
    (2021) Gapping in Modern Standard Arabic: An Agree-Based Analysis. Umm Al-Qura University Journal for Languages & Literature, (27). 10.54940/ll38299970
    https://doi.org/10.54940/ll38299970 [Google Scholar]
  12. Bouzid, Saoussen, & Zribi, Chiraz
    (2021) Efficient learning approach for pronominal anaphora and ellipsis identification and resolution in Arabic texts. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 291, 3335–3348. 10.1109/TASLP.2021.3120649
    https://doi.org/10.1109/TASLP.2021.3120649 [Google Scholar]
  13. Carnie, Andrew
    (2021) Syntax: A generative introduction. John Wiley & Sons.
    [Google Scholar]
  14. Cavar, Damir, Mompelat, Ludovic & Abdo, Muhammad
    (2024) The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications. InMichael Hahn and Alexey Sorokin and Ritesh Kumar, , (Eds.), Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (pp.46–54). Association for Computational Linguistics.
    [Google Scholar]
  15. Elshabrawy, Ahmed, AbuOdeh, Muhammed, Inoue, Go, & Habash, Nizar
    (2023) CamelParser2. 0: A State-of-the-Art Dependency Parser for Arabic. InProceedings of ArabicNLP 2023 (pp.170–180). 10.18653/v1/2023.arabicnlp‑1.15
    https://doi.org/10.18653/v1/2023.arabicnlp-1.15 [Google Scholar]
  16. El-Shiyab, Said
    (1998) Ellipsis in Arabic and its impact on translation. al-’Arabiyya, 39–54.
    [Google Scholar]
  17. Fatani, Afnan
    (2010) Al-Zarkashī on Ellipsis in the Qur’ān: A Translation & Critical Synopsis. Journal of Arabic Linguistics, 81.
    [Google Scholar]
  18. Green, Spence, & Manning, Christopher
    (2010) Better Arabic parsing: Baselines, evaluations, and analysis. InProceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp.394–402).
    [Google Scholar]
  19. Habash, Nizar, & Roth, Ryan
    (2009) CATiB: The Columbia Arabic Treebank. InProceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp.221–224). Association for Computational Linguistics. 10.3115/1667583.1667651
    https://doi.org/10.3115/1667583.1667651 [Google Scholar]
  20. Haddar, Kais, & Hamadou, Abdelmajid
    (1998) An Ellipsis Detection Method Based on a Clause Parser for Arabic Language. InProceedings of the Eleventh International Florida Artificial Intelligence Research Society Conference (pp.270–274).
    [Google Scholar]
  21. Hawkins, Roger
    (2012) Knowledge of English verb phrase ellipsis by speakers of Arabic and Chinese. Linguistic Approaches to Bilingualism, 2(4), 404–438. 10.1075/lab.2.4.03haw
    https://doi.org/10.1075/lab.2.4.03haw [Google Scholar]
  22. Homerin, Thomas
    (2007) [Review of the book The Diwan of Ibn al-Farid: Readings of its Text Throughout History, by G. Scattolin]. Mamlūk Studies Review, 11(1), [243]. 10.6082/M1QJ7FG5
    https://doi.org/10.6082/M1QJ7FG5 [Google Scholar]
  23. Johnson, Kyle
    (2001) What VP Ellipsis Can Do, and What it Can’t, But Not Why. The Handbook of Contemporary Syntactic Theory, 439–479. Portico. 10.1002/9780470756416.ch14
    https://doi.org/10.1002/9780470756416.ch14 [Google Scholar]
  24. Kilgarriff, Adam, Rychly, Pavel, Smrz, Pavel, & Tugwell, David
    (2008) The Sketch Engine. InP. Fontenelle (Ed.), Practical Lexicography: A Reader (pp.297–306). Oxford University Press. 10.1093/oso/9780199292332.003.0020
    https://doi.org/10.1093/oso/9780199292332.003.0020 [Google Scholar]
  25. Maamouri, Mohamed, Bies, Ann, Buckwalter, Tim, & Mekki, Wigdan
    (2004) The penn arabic treebank: Building a large-scale annotated arabic corpus. InNEMLAR conference on Arabic language resources and tools (Vol.271, pp.466–467).
    [Google Scholar]
  26. Manning, Christopher, Surdeanu, Mihai, Bauer, John, Finkel, Jenny, Bethard, Steven & McClosky, David
    (2014) The Stanford CoreNLP natural language processing toolkit. InProceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp.55–60). 10.3115/v1/P14‑5010
    https://doi.org/10.3115/v1/P14-5010 [Google Scholar]
  27. Mansour, Mohamed
    (2007) Semantic constraints on licensing VP-ellipsis and VP-gapping in Arabic. Bulletin of the Faculty of Arts, Assiut University.
    [Google Scholar]
  28. Mcshane, Marjorie, A Theory of Ellipsis
    (2005) Oxford University Press. 10.1093/oso/9780195176926.001.0001, accessed24 May 2024.
    https://doi.org/10.1093/oso/9780195176926.001.0001
  29. Merchant, Jason
    (2006) Sluicing. The Blackwell companion to syntax, 271–291. 10.1002/9780470996591.ch60
    https://doi.org/10.1002/9780470996591.ch60 [Google Scholar]
  30. . Ellipsis: A survey of analytical approaches. InJeroen van Craenenbroeck and Tanja Temmerman (Eds.) The Oxford Handbook of Ellipsis, Oxford Handbooks (2018), 10.1093/oxfordhb/9780198712398.013.2, accessed2 June 2024.
    https://doi.org/10.1093/oxfordhb/9780198712398.013.2 [Google Scholar]
  31. Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey, & Svartvik, Jan
    (1985) A comprehensive grammar of the English language. Longman.
    [Google Scholar]
  32. Solimando, Cristina
    (2011) Ellipsis in the Arabic Linguistic Thinking (8th–10th Century). InThe Word in Arabic. Leiden, The Netherlands: Brill. 10.1163/9789004206427_006
    https://doi.org/10.1163/9789004206427_006 [Google Scholar]
  33. Taylor, Wilson
    (1953) ‘“Cloze Procedure”: A New Tool for Measuring Readability. Journalism Quarterly30(4), 415–433. 10.1177/107769905303000401
    https://doi.org/10.1177/107769905303000401 [Google Scholar]
/content/journals/10.1075/arli.00013.abd
Loading
/content/journals/10.1075/arli.00013.abd
Loading

Data & Media loading...

  • Article Type: Research Article
Keyword(s): Arabic NLP; computational syntax; Ellipsis; Large Language Models
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error