
Full text loading...
Corpora are an important resource for both teaching and research. Arabic lacks sufficient resources in this field, so a research project has been designed to compile a corpus, which represents the state of the Arabic language at the present time and the needs of end-users. This report presents the result of a survey of the needs of teachers of Arabic as a foreign language (TAFL) and language engineers. The survey shows that a wide range of text types should be included in the corpus. Overall, our survey confirms our view that existing corpora are too narrowly limited in source-type and genre, and that there is a need for a freely-accessible corpus of contemporary Arabic covering a broad range of text-types. We have collected and published an initial version of the Corpus of Contemporary Arabic (CCA) to meet these design issues. The CCA is freely downloadable via WWW from http://www.comp.leeds.ac.uk/arabic.