- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 10, Issue, 2005
International Journal of Corpus Linguistics - Volume 10, Issue 4, 2005
Volume 10, Issue 4, 2005
-
Frequency of ‘core idioms’ in the British National Corpus (BNC)
Author(s): Lynn E. Grantpp.: 429–451 (23)More LessThis article looks at how a comprehensive list of one category of idioms, that of ‘core idioms’, was established. When the criteria to define a core idiom were strictly applied to a dictionary of idioms, the result was that the large number of ‘idioms’ was reduced to a small number of ‘core idioms’. The original list from the first source dictionary was added to by applying the same criteria to other idiom dictionaries, and other sources of idioms. Once the list was complete, a corpus search of the final total of 104 ‘core idioms’ was carried out in the British National Corpus (BNC). The search revealed that none of the 104 core idioms occurs frequently enough to merit inclusion in the 5,000 most frequent words of English.
-
New generation corpus-based frequency dictionaries: The case of Czech
Author(s): Frantiek Čermák and Michal Krenpp.: 453–467 (15)More LessWith a brief outline of the history of frequency dictionaries of the past, a renewed need for these dictionaries, based on modern large corpora, is observed. A linguistic background and some criticism of what has been done is offered. It is stressed that, next to the traditional and predominant paradigmatic approach, at least some attention should be paid to syntagmatic or combinatorial aspects of the lexis as well, as these are readily available in modern corpora. Against this background, solution to this need and projection of corpus possibilities is presented as it has been realized in the recent large Frequency Dictionary of Czech (Frekvencní slovník cetiny). A brief discussion of the linguistic and lexicographic problems is followed by a brief step-by-step description of technical solution.
-
A multi-level semantic approach to Korean causal conjunctive suffixes -(e)se and -(u)nikka: A corpus-based analysis
Author(s): Sang-suk Ohpp.: 469–488 (20)More LessThe purpose of this paper is to discuss actual usage of the two Korean causal conjunctive suffixes, -(e)seand-(u)nikka, and to propose their multi-layered semantics based on analysis of corpus data. To account for the functional differences of the two conjunctives, most previous studies focused on different syntactic distributions or semantic contrast by employing an objectivist viewpoint, failing to incorporate the polyfunctionality, semantic overlapping and pragmatic ambiguities of them. This paper advances that the meanings of the two causal suffixes are distributed on four different cognitive-discourse levels: content, epistemic, speech act, and discourse level. Corpus analysis does reveal that all four levels are accessible to both conjunctive suffixes but the difference between the two suffixes lies in the different degree of accessibility of these four levels in their sentence semantics. This finding suggests that we treat these linguistic categories more flexibly by accepting their gradient and pragmatically ambiguous status.
-
Using corpora in machine-learning chatbot systems
Author(s): Bayan Abu Shawar and Eric Steven Atwellpp.: 489–516 (28)More LessA chatbot is a machine conversation system which interacts with human users via natural conversational language. Software to machine-learn conversational patterns from a transcribed dialogue corpus has been used to generate a range of chatbots speaking various languages and sublanguages including varieties of English, as well as French, Arabic and Afrikaans. This paper presents a program to learn from spoken transcripts of the Dialogue Diversity Corpus of English, the Minnesota French Corpus, the Corpus of Spoken Afrikaans, the Qur'an Arabic-English parallel corpus, and the British National Corpus of English; we discuss the problems which arose during learning and testing. Two main goals were achieved from the automation process. One was the ability to generate different versions of the chatbot in different languages, bringing chatbot technology to languages with few if any NLP resources: the corpus-based learning techniques transferred straightforwardly to develop chatbots for Afrikaans and Qur'anic Arabic. The second achievement was the ability to learn a very large number of categories within a short time, saving effort and errors in doing such work manually: we generated more than one million AIML categories or conversation-rules from the BNC corpus, 20 times the size of existing AIML rule-sets, and probably the biggest AI Knowledge-Base ever.
-
Creating and using Web corpora
Author(s): Mike Thelwallpp.: 517–541 (25)More LessThe Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false
-
-
Comparing Corpora
Author(s): Adam Kilgarriff
-
- More Less