Flexible multi-layer spoken dialogue corpora

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.

Buy this article

Price: £15.00+Taxes
Add to favourites

The full text of this article is not currently available.

Data & Media loading...


Full text loading...


Anderson, A.H. , Bader, M. , Gurman Bard, E. , Boyle, E. , Doherty, G. , Garrod, S. , Isard, S. , Kowtko, J. , McAllister, J. , Miller, J. , Sotillo, C. , Thompson, H.S. , & Weinert, R
(1991) The HCRC Map Task Corpus. Language and Speech, 34(4), 351–366.
Belz, M
(2013) Disfluencies und Reparaturen bei Muttersprachlern und Lernern: Eine kontrastive Analyse. Humboldt-Universität zu Berlin. Retrieved (last accessedMarch 2014).
(2014) BeMaTaC: A Deeply Annotated Multimodal Map-task Corpus of Spoken Learner and Native German. Retrieved (last accessedMarch 2014).
Boersma, P
(2010) Praat: A system for doing phonetics by computer. Glot International, 5(9/10), 341–345.
Brinckmann, C. , Kleiner, S. , Knöbl, R. , & Berend, N
(2008) German today: An areally extensive corpus of spoken Standard German. In N. Calzolari , Kh. Choukri , B. Maegaard , J. Mariani , J. Odijk , S. Piperidis & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (pp.3185–3191). Paris: ELRA.
Buchholz, S. , & Marsi, E
(2006) CoNLL-X shared task on multilingual dependency parsing. In L. Màrquez & D. Klein (Eds.), Proceedings of the 10th Conference on Computational Natural Language Learning (pp.149–164). Stroudsburg, PA: Association for Computational Linguistics.
Burnard, L
(Ed.) (2007) Reference Guide for the British National Corpus (XML Edition). Oxford: Research Technologies Service. Retrieved (last accessedMarch 2014).
Carletta J. , Evert, S. , Heid, U. , Kilgour, J. , Robertson, J. , & Voormann, H
(2003) The NITE XML Toolkit: Flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, & Computers, 35(3), 353–363. doi: 10.3758/BF03195511
Carletta J. , Evert, S. , Heid, U. , & Kilgour, J
(2005) The NITE XML Toolkit: Data model and query. Language Resources and Evaluation, 39(4), 313–334. doi: 10.1007/s10579‑006‑9001‑9
Chiarcos, C. , Dipper, S. , Götze, M. , Leser, U. , Lüdeling, A. , Ritz, J. , & Stede, M
(2009) A flexible framework for integrating annotations from different tools and tagsets. Traitement Automatique des Langues, 49(2), 271–291.
Creative Commons
(2014) About the Licenses - Creative Commons. Retrieved (last accessedMarch 2014).
Dipper, S
(2005) XML-based stand-off representation and exploitation of multi-level linguistic annotation. In R. Eckstein & R. Tolksdorf (Eds.), Proceedings of Berliner XML Tage 2005 (pp.39–50). Berlin: Humboldt-Universität zu Berlin.
Dipper, S. , Lüdeling, A. , & Reznicek, M
(2013) NoSta-D: A corpus of German non-standard varieties. In M. Zampieri & S. Diwersy (Eds.), Non-Standard Data Sources in Corpus-Based Research (pp.69–76). Aachen: Shaker.
Druskat, S. , Bierkandt, L. , Gast, V. , Rzymski, C. , & Zipser, F
(2014) Atomic: An open-source software platform for multi-level corpus annotation. In J. Ruppenhofer & G. Faaß (Eds.), Proceedings of the 12th Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2014) (pp.228–234). Retrieved (last accessedMay 2015).
Gerdes, K
(2014) Arborator [Computer software]. Retrieved (last accessed March 2014).
Giesel, L. , Klapi, M. , Krüger, D. , Nunberger, I. , Rasskazova, O. , & Sauer, S
(2013) Berlin Map Task Corpus: A deeply annotated multimodal map-task corpus of spoken learner and native German. Poster presented at the 35. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft , Potsdam, Germany. Retrieved (last accessedMarch 2014).
Hall, M. , Frank, E. , Holmes, G. , Pfahringer, B. , Reutemann, P. , & Witten, I.H
(2009) The WEKA data mining software: An update. In O.R. Zaiane (Ed.), SIGKDD Explorations, 11(1), 10–18.
Hanke, T. , & Storz, J
(2008) iLex: A database tool for integrating sign language corpus linguistics and sign language lexicography. In O. Crasborn , E. Efthimiou , T. Hanke , E. Thoutenhoofd & I. Zwitserlood (Eds.), LREC 2008 Workshop, Proceedings, W 25: 3rd Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora (pp.64–67). Paris: ELRA.
Himmelmann, N.P
(2012) Linguistic data types and the interface between language documentation and description. Language Documentation & Conservation, 6, 187–207.
Hinrichs, E.W. , Hinrichs, M. , & Zastrow, T
(2010) WebLicht: Web-Based LRT services for German. InACL 2010 System Demonstrations, Proceeding (pp.25–29). Stroudsburg, PA: Association for Computational Linguistics.
Ide, N. , & Suderman, K
(2007) GrAF: A graph-based format for linguistic annotations. In B. Boguraev , N. Ide , A. Meyers , Sh. Nariyama , M. Stede , J. Wiebe & G. Wilcock (Eds.), ACL 2007 Workshop, Proceedings, Linguistic Annotation Workshop (pp.25–29). Stroudsburg, PA: Association for Computational Linguistics.
Kirk, J.M
. (this volume). The pragmatic annotation scheme of the SPICE-Ireland corpus.
Krause, T. , Lüdeling, A. , Odebrecht, C. , & Zeldes, A
(2012) Multiple tokenization in a diachronic corpus. Paper presented at Exploring Ancient Languages through Corpora Conference 2012 , Oslo. Retrieved (last accessedMarch 2014).
Krause, T. , & Zeldes, A
(2014) ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities. Retrieved (last accessedMay 2015).
Lüdeling, A
(2011) Corpora in linguistics: Sampling and annotation. In K. Grandin (Ed.), Going Digital. Evolutionary and Revolutionary Aspects of Digitization (pp.220–243). New York, NY: Science History Publications.
Max Planck Society
(2014) Max Planck Open Access: Berlin Declaration. Retrieved (last accessedMarch 2014).
Müller, C. , & Strube, M
(2006) Multi-level annotation of linguistic data with MMAX2. In S. Braun , K. Kohn & J. Mukherjee (Eds.), Corpus Technology and Language Pedagogy (pp.197–214). Frankfurt am Main: Peter Lang,
Nivre, J
(2008) Treebanks. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp.225–241). Berlin: Mouton de Gruyter.
Pajas P. , & Stepanek, J
(2008) Recent advances in a feature-rich framework for treebank annotation. In Proceedings of the 22nd International Conference on Computational Linguistics (pp.673–680). Stroudsburg, PA: Association for Computational Linguistics.
R Core Team
(2013) R: A Language and Environment for Statistical Computing [Computer software]. Retrieved (last accessedMarch 2014).
Sauer, S. , & Rasskazova, O
(2014) BeMaTaC: Eine digitale multimodale Ressource für Sprach- und Dialogforschung. Poster presented at the workshop Grenzen überschreiten – Digitale Geisteswissenschaft heute und morgen , Berlin, Germany. Retrieved (last accessedMarch 2014).
Schiel, F. , Draxler, C. , & Harrington, J
(2011) Phonemic segmentation and labelling using the MAUS technique. Workshop New Tools and Methods for Very-Large-Scale Phonetics Research . Retrieved (last accessedApril 2016).
Schiller, A. , Teufel, S. , Stöckert, C. , & Thielen, C
(1999) Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset). Retrieved (last accessedMarch 2014).
Schmid, H
(1994) Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing . Retrieved from (last accessed November 2014).
2008 Tokenizing and part-of-speech tagging. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp.527–551). Berlin: Mouton de Gruyter.
Schmidt, T
(2004) Transcribing and annotating spoken language with EXMARaLDA. In A. Witt , U. Heid , H.S. Thompson , J. Carletta & P. Wittenburg (Eds.), LREC 2004 Workshop, Proceedings, XML-based Richly Annotated Corpora (pp.69–74). Paris: ELRA.
Schmidt, T. , & Wörner, K
(2009.) EXMARaLDA: Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics, 19(4), 565–582. doi: 10.1075/prag.19.4.06sch
Schmidt, T. , Hedeland, H. , Lehmberg, T. , & Wörner, K
(2010) HAMATAC: The Hamburg MapTask Corpus. Retrieved (last accessedMarch 2014).
Sloetjes, H. , & Wittenburg, P
(2008) Annotation by category: ELAN and ISO DCR. In N. Calzolari , Kh. Choukri , B. Maegaard , J. Mariani , J. Odijk , S. Piperidis & D. Tapias (Eds.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (pp.816–820). Paris: ELRA.
Stede, M
(2011) Discourse Processing. San Rafael, CA: Morgan & Claypool.
Stenetorp, P. , Pyysalo, S. , Topić, G. , Ohta, T. , Ananiadou, S. , & Tsujii, J
2012 Brat: A web-based tool for NLP-assisted text annotation. In F. Segond (Ed.), Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp.102–107). Stroudsburg, PA: Association for Computational Linguistics.
Stührenberg, M
(2012) The TEI and current standards for structuring linguistic data. In P. Bański , E. Litta Modignani Picozzi & A. Witt (Eds.), Journal of the Text Encoding Initiative, 3. Retrieved (last accessedMarch 2014).
TEI Consortium
(2014) TEI: Text Encoding Initiative. Retrieved (last accessedMarch 2014).
Thompson, P
(2005) Spoken language corpora. In M. Wynne (Ed.), Developing Linguistic Corpora: A Guide to Good Practice (pp.59–70). Oxford: Oxbow Books. Retrieved (last accessedMarch 2014).
Wichmann, A
(2008) Speech corpora and spoken corpora. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp.187–207). Berlin: Mouton de Gruyter.
Wörner, K
(2009) Werkzeuge zur flachen Annotation von Transkriptionen gesprochener Sprache. Bielefeld: Bielefeld University. Retrieved from (last accessedApril 2016).
Wynne, M
(2008) Searching and concordancing. In A. Lüdeling , & M. Kytö . (Eds.), Corpus Linguistics. An International Handbook (pp.706–737). Berlin: Mouton de Gruyter.
Yimam, S.M. , Gurevych, I. , Eckart de Castilho, R. , & Biemann, C
(2013) WebAnno: A flexible, web-based and visually supported system for distributed annotations. In M. Butt & S. Hussain (Eds.), 51st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference System Demonstration (pp.1–6). Stroudsburg, PA: Association for Computational Linguistics.
Zeldes, A. , Ritz, J. , Lüdeling, A. , & Chiarcos, C
(2009) ANNIS: A search tool for multi-layer annotated corpora. In M. Mahlberg , V. González-Díaz & C. Smith (Eds.), Proceedings of Corpus Linguistics 2009. Retrieved (last accessedMarch 2014).
Zipser, F. , & Romary, L
(2010) A model oriented approach to the mapping of annotation formats using standards. In G. Budin , L. Romary , T. Declerck & P. Wittenburg (Eds.), LREC 2010 Workshop, Proceedings, W4: Language Resource and Language Technology Standards. Paris: ELRA. Retrieved (last accessedNovember 2014).
This is a required field
Please enter a valid email address