Volume 28, Issue 4
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



We propose a method for the automatic induction of categories of Spanish discourse markers using parallel corpora, based on a quantitative and empirical approach that minimises explicit linguistic knowledge. We conducted the analysis the using a large Spanish-English parallel corpus. First, we used this corpus to obtain a list of parenthetical discourse markers in each language. Then, we used it as a “semantic mirror”, inspecting the English equivalences and assessing which Spanish discourse markers fulfil a similar function in discourse and vice versa. The result of this procedure is an emerging categorisation of discourse markers. The main contribution is to offer empirical evidence for the adequacy of existing manually-compiled taxonomies and the potential for discovery of new, unaccounted categories. In this article we focus on units pertaining to the Spanish language but, since the method is purely quantitative, it is possible to apply it to different languages as well.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aijmer, K.
    (2015) Analysing discourse markers in spoken corpora: Actually as a case study. InP. Baker & T. McEnery (Eds.), Corpora and Discourse Studies: Integrating Discourse and Corpora (pp.88–109). Palgrave Macmillan. 10.1057/9781137431738_5
    https://doi.org/10.1057/9781137431738_5 [Google Scholar]
  2. Aijmer, K., Foolen, A., & Vandenbergen, A.-M.
    (2006) Pragmatic markers in translation: A methodological proposal. InK. Fischer (Ed.), Approaches to Discourse Particles (pp.101–114). Elsevier.
    [Google Scholar]
  3. Aijmer, K., & Simon-Vandenbergen, A.-M.
    (2004) A model and a methodology for the study of pragmatic markers: The semantic field of expectation. Journal of Pragmatics, 36(10), 1781–1805. 10.1016/j.pragma.2004.05.005
    https://doi.org/10.1016/j.pragma.2004.05.005 [Google Scholar]
  4. Alonso, L., Castellón, I., Gibert, K., & Padró, L.
    (2002) An empirical approach to discourse markers by clustering. InM. T. Escrig, F. Toledo, & E. Golobardes (Eds.), Topics in Artificial Intelligence. Proceedings of 5th Catalonian Conference on AI, CCIA 2002, LNCS (LNAI), vol. 2504 (pp.173–183). Springer. 10.1007/3‑540‑36079‑4_15
    https://doi.org/10.1007/3-540-36079-4_15 [Google Scholar]
  5. Alonso, L., Castellón, I., & Padró, L.
    (2002) Lexicón computacional de marcadores del discurso [Computational lexicon of discourse markers]. Procesamiento del lenguaje natural, 291, 239–246.
    [Google Scholar]
  6. Bestgen, Y., Degand, L., & Spooren, W.
    (2006) Toward automatic determination of the semantics of connectives in large newspaper corpora. Discourse Processes, 41(2), 175–193. 10.1207/s15326950dp4102_4
    https://doi.org/10.1207/s15326950dp4102_4 [Google Scholar]
  7. Bourgonje, P., Grishina, Y., & Stede, M.
    (2017) Toward a bilingual lexical database on connectives: Exploiting a German/Italian parallel corpus. InR. Basili, M. Nissim, & G. Satta (Eds.), Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017: 11–12 December 2017, Rome (pp.53–58). Accademia University Press. books.openedition.org/aaccademia/2360. 10.4000/books.aaccademia.2360
    https://doi.org/10.4000/books.aaccademia.2360 [Google Scholar]
  8. Brezina, V.
    (2018) Statistics in Corpus Linguistics: A Practical Guide. Cambridge University Press. 10.1017/9781316410899
    https://doi.org/10.1017/9781316410899 [Google Scholar]
  9. Briz, A., Pons, S., & Portolés, J.
    (Eds.) (2008) Diccionario de partículas discursivas del español [Dictionary of Spanish discourse markers]. RetrievedDecember, 2019, from www.dpde.es
    [Google Scholar]
  10. Calsamiglia, H., & Tusón, A.
    (1999) Las cosas del decir: Manual de análisis del discurso [The Things of Saying: A Handbook of Discourse Analysis]. Ariel.
    [Google Scholar]
  11. Casado Velarde, M.
    (1993) Introducción a la gramática del texto del español [Introduction to the Grammar of Spanish Texts]. Arco/Libros.
    [Google Scholar]
  12. Cornillie, B., & Gras, P.
    (2015) On the interactional dimension of evidentials: The case of the Spanish evidential discourse markers. Discourse Studies, 17(2), 141–161. 10.1177/1461445614564518
    https://doi.org/10.1177/1461445614564518 [Google Scholar]
  13. Crible, L., Abuczki, Á., Burkšaitienė, N., Furkó, P., Nedoluzhko, A., Rackevičienė, S., Oleškevičienė, G. V., & Zikánová, Š.
    (2019) Functions and translations of discourse markers in TED Talks: A parallel corpus study of underspecification in five languages. Journal of Pragmatics, 1421, 139–155. 10.1016/j.pragma.2019.01.012
    https://doi.org/10.1016/j.pragma.2019.01.012 [Google Scholar]
  14. Crible, L., & Cuenca, M.-J.
    (2017) Discourse markers in speech: Characteristics and challenges for corpus annotation. Dialogue and Discourse, 8(2), 149–166. 10.5087/dad.2017.207
    https://doi.org/10.5087/dad.2017.207 [Google Scholar]
  15. Cuenca, M. J.
    (2001) Los conectores parentéticos como categoría gramatical [Parenthetical connectives as a grammatical category]. LEA. Lingüística Española Actual, 23(2), 211–236.
    [Google Scholar]
  16. Degand, L.
    (2009) On describing polysemous discourse markers: What does translation add to the picture?InS. Slembrouck, M. Taverniers, & M. Van Herreweghe (Eds.), From will to well: Studies in Linguistics Offered to Anne-Marie Simon-Vandenbergen (pp.173–184). Academia Press.
    [Google Scholar]
  17. Divjak, D., & Fieller, N.
    (2014) Cluster analysis: Finding structure in linguistic data. InD. Glynn, & J. A. Robinson (Eds.), Corpus Methods for Semantics: Quantitative Studies in Polysemy and Synonymy (pp.405–442). John Benjamins. 10.1075/hcp.43.16div
    https://doi.org/10.1075/hcp.43.16div [Google Scholar]
  18. Dyvik, H.
    (1998) A translational basis for semantics. InS. Johansson & S. Oksefjell (Eds.), Corpora and Cross-linguistic Research: Theory, Method and Case Studies (pp.51–86). Rodopi.
    [Google Scholar]
  19. (2004) Translations as semantic mirrors: From parallel corpus to WordNet. Language and Computers, 11, 311–326.
    [Google Scholar]
  20. Dixon, P.
    (2003) VEGAN, a package of R functions for community ecology. Journal of Vegetation Science, 14(6), 927–930. 10.1111/j.1654‑1103.2003.tb02228.x
    https://doi.org/10.1111/j.1654-1103.2003.tb02228.x [Google Scholar]
  21. Fedriani, C., & Sansò, A.
    (2017) Pragmatic markers, discourse markers and modal particles: What do we know and where do we go from here?InC. Fedriani & A. Sansò (Eds.), Pragmatic Markers, Discourse Markers and Modal Particles: New Perspectives (pp.1–33). John Benjamins. 10.1075/slcs.186.01fed
    https://doi.org/10.1075/slcs.186.01fed [Google Scholar]
  22. Fischer, K.
    (2006) Towards an understanding of the spectrum of approaches to discourse particles: Introduction to the volume. InK. Fischer (Ed.), Approaches to Discourse Particles (pp.1–20). Elsevier. 10.1163/9780080461588_002
    https://doi.org/10.1163/9780080461588_002 [Google Scholar]
  23. (2014) Discourse markers. InK. Schneider & K. Barron (Eds.), Pragmatics of Discourse (pp.271–294). De Gruyter Mouton. 10.1515/9783110214406‑011
    https://doi.org/10.1515/9783110214406-011 [Google Scholar]
  24. Fraser, B.
    (1999) What are discourse markers?Journal of Pragmatics, 31(7), 931–952. 10.1016/S0378‑2166(98)00101‑5
    https://doi.org/10.1016/S0378-2166(98)00101-5 [Google Scholar]
  25. (2009) An account of discourse markers. International Review of Pragmatics, 1(2), 293–320. 10.1163/187730909X12538045489818
    https://doi.org/10.1163/187730909X12538045489818 [Google Scholar]
  26. Fuentes Rodríguez, C.
    (2009) Diccionario de conectores y operadores del español [Dictionary of Spanish Connectives and Operators]. Arco/Libros.
    [Google Scholar]
  27. Furkó, B. P.
    (2014) Perspectives on the translation of discourse markers. Acta Universitatis Sapientiae, Philologica, 6(2), 181–196. 10.1515/ausp‑2015‑0013
    https://doi.org/10.1515/ausp-2015-0013 [Google Scholar]
  28. Gan, G., Ma, C., & Wu, J.
    (2007) Data Clustering: Theory, Algorithms, and Applications. SIAM/ASA. 10.1137/1.9780898718348
    https://doi.org/10.1137/1.9780898718348 [Google Scholar]
  29. Gries, S. T.
    (2013) Statistics for Linguistics with R: A Practical Introduction. De Gruyter Mouton. 10.1515/9783110307474
    https://doi.org/10.1515/9783110307474 [Google Scholar]
  30. Hajlaoui, N., & Popescu-Belis, A.
    (2013) Assessing the accuracy of discourse connective translations: Validation of an automatic metric. InA. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (pp.236–247). Springer. 10.1007/978‑3‑642‑37256‑8_20
    https://doi.org/10.1007/978-3-642-37256-8_20 [Google Scholar]
  31. Hansen, M.-B. M.
    (1998) The semantic status of discourse markers. Lingua, 1041, 235–260. 10.1016/S0024‑3841(98)00003‑5
    https://doi.org/10.1016/S0024-3841(98)00003-5 [Google Scholar]
  32. Hidey, C., & McKeown, K.
    (2016) Identifying causal relations using parallel Wikipedia articles. InK. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp.1424–1433). Association for Computational Linguistics. 10.18653/v1/P16‑1135
    https://doi.org/10.18653/v1/P16-1135 [Google Scholar]
  33. Holgado Lage, A.
    (2017) Diccionario de marcadores discursivos para estudiantes de español como segunda lengua [Dictionary of Discourse Markers for Learners of Spanish as a Second Language]. Peter Lang. 10.3726/b11456
    https://doi.org/10.3726/b11456 [Google Scholar]
  34. Hutchinson, B.
    (2003) Automatic classification of discourse markers on the basis of their co-occurrences. InM. Stede & H. Zeevat (Eds.), Proceedings of the ESSLLI Workshop The Meaning and Implementation of Discourse Particles (pp.1–8). University of Groningen.
    [Google Scholar]
  35. (2004a) Mining the web for discourse markers. InM. T. Lino, M. F. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA). www.lrec-conf.org/proceedings/lrec2004/pdf/333.pdf
    [Google Scholar]
  36. (2004b) Acquiring the meaning of discourse markers. InProceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 684–691. 10.3115/1218955.1219042
    https://doi.org/10.3115/1218955.1219042 [Google Scholar]
  37. (2005) The Automatic Acquisition of Knowledge about Discourse Connectives [Doctoral dissertation, The University of Edinburgh]. Edinburgh Research Archive. https://era.ed.ac.uk/handle/1842/852
    [Google Scholar]
  38. Jain, A. K., & Dubes, R. C.
    (1988) Algorithms for Clustering Data. Prentice-Hall.
    [Google Scholar]
  39. Kaufman, L., & Rousseeuw, P. J.
    (2009) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.
    [Google Scholar]
  40. Knott, A.
    (1996) A Data-Driven Methodology for Motivating a Set of Coherence Relations [Doctoral dissertation, The University of Edinburgh]. Edinburgh Research Archive. https://era.ed.ac.uk/handle/1842/583
    [Google Scholar]
  41. Knott, A., & Dale, R.
    (1994) Using linguistic phenomena to motivate a set of coherence relations. Discourse Processes, 18(1), 35–62. 10.1080/01638539409544883
    https://doi.org/10.1080/01638539409544883 [Google Scholar]
  42. Laali, M., & Kosseim, L.
    (2014) Inducing discourse connectives from parallel texts. InJ. Tsujii & J. Hajic (Eds.), Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp.610–619). Dublin City University and Association for Computational Linguistics. https://aclanthology.org/C14-1058
    [Google Scholar]
  43. Llopis, A.
    (2016) Significado y funciones en los marcadores discursivos [The semantics and functions of discourse markers]. Verba, 431, 231–268. 10.15304/verba.43.2112
    https://doi.org/10.15304/verba.43.2112 [Google Scholar]
  44. López Serena, A., & Borreguero, M.
    (2010) Los marcadores del discurso y la variación lengua hablada vs. lengua escrita [Discourse markers and the spoken vs. written language variation]. InÓ. Loureda & E. Acín (Eds.), Los estudios sobre marcadores del discurso en español, hoy (pp.415–495). Arco/Libros.
    [Google Scholar]
  45. Loureda, Ó., & Acín, E.
    (2010) Cuestiones candentes en torno a los marcadores del discurso en español [Hot issues on discourse markers in Spanish]. InÓ. Loureda & E. Acín (Eds.), Los estudios sobre marcadores del discurso en español, hoy (pp.7–59). Arco/Libros.
    [Google Scholar]
  46. Marcu, D.
    (1998) A surface-based approach to identifying discourse markers and elementary textual units in unrestricted texts. InProceedings of the Workshop: Discourse Relations and Discourse Markers, COLiNG-ACL’98 (pp.1–7). Montreal, Quebec, Canada. https://aclanthology.org/W98-0301
    [Google Scholar]
  47. Martín Zorraquino, M. A.
    (2010) Los marcadores del discurso y su morfología [Discourse markers and their morphology]. InÓ. Loureda & E. Acín (coords.), Los estudios sobre marcadores del discurso en español, hoy (pp.93–181). Arco/Libros.
    [Google Scholar]
  48. Martín Zorraquino, M. A., & Portolés, J.
    (1999) Los marcadores del discurso [Discourse markers]. InI. Bosque & V. Demonte (Eds.), Gramática descriptiva de la lengua española, Vol.31 (pp.4051–4213). Espasa-Calpe.
    [Google Scholar]
  49. Montolío, E.
    (2001) Conectores de la lengua escrita. Contraargumentativos, consecutivos, aditivos y organizadores de la información [Written Language Connectives. Counterargumentative, Consecutive, Additive and Information Organisers.]. Ariel.
    [Google Scholar]
  50. Mortier, L., & Degand, L.
    (2009) Adversative discourse markers in contrast: The need for a combined corpus approach. International Journal of Corpus Linguistics, 14(3), 338–366. 10.1075/ijcl.14.3.03mor
    https://doi.org/10.1075/ijcl.14.3.03mor [Google Scholar]
  51. Muller, P., Conrath, J., Afantenos, S., & Asher, N.
    (2016) Data-driven discourse markers representation and classification. InProceedings of TextLink–Structuring Discourse in Multilingual Europe, Second Action Conference (pp.93–97). Budapest, Hungary.
    [Google Scholar]
  52. Noël, D.
    (2003) Translations as evidence for semantics: An illustration. Linguistics, 41(4), 757–785. 10.1515/ling.2003.024
    https://doi.org/10.1515/ling.2003.024 [Google Scholar]
  53. Pons, S., & Fischer, K.
    (2021) Using discourse segmentation to account for the polyfunctionality of discourse markers: The case of well. Journal of Pragmatics, 173(2), 101–118. 10.1016/j.pragma.2020.11.021
    https://doi.org/10.1016/j.pragma.2020.11.021 [Google Scholar]
  54. Portolés, J.
    (2016) Los marcadores del discurso [Discourse markers]. InJ. Gutiérrez-Rexach (Ed.), Enciclopedia de Lingüística Hispánica, Vol. 1 (pp.689–699). Routledge. 10.4324/9781315713441‑61
    https://doi.org/10.4324/9781315713441-61 [Google Scholar]
  55. R Core Team
    R Core Team (2020) R: A language and environment for statistical computing. [Computer software]. R Foundation for Statistical Computing. https://www.r-project.org/
    [Google Scholar]
  56. Rouchota, V.
    (1998) Procedural meaning and parenthetical discourse markers. InA. Jucker & Y. Ziv (Eds.), Discourse Markers: Description and Theory (pp.97–126). John Benjamins. 10.1075/pbns.57.07rou
    https://doi.org/10.1075/pbns.57.07rou [Google Scholar]
  57. Santos Río, L.
    (2003) Diccionario de partículas [Dictionary of Particles]. Luso-española de ediciones.
    [Google Scholar]
  58. Schiffrin, D.
    (2001) Discourse markers: Language, meaning, and context. InD. Schiffrin, D. Tannen & H. E. Hamilton (Eds.), The Handbook of Discourse Analysis (pp.54–75). Blackwell.
    [Google Scholar]
  59. Schmid, H.
    (1994) Probabilistic part-of-speech tagging using decision trees. InProceedings of the International Conference on New Methods in Language Processing (pp.44–49). Manchester, UK.
    [Google Scholar]
  60. Tiedemann, J.
    (2016) Opus – parallel corpora for everyone. InProceedings of the 19th Annual Conference of the European Association of Machine Translation (EAMT): Projects/Products (p.384). EAMT 2016, Riga, Latvia. https://aclanthology.org/2016.eamt-2.8
    [Google Scholar]
  61. Tognini-Bonelli, E.
    (2001) Corpus Linguistics at Work. John Benjamins. 10.1075/scl.6
    https://doi.org/10.1075/scl.6 [Google Scholar]
  62. Torrent, A.
    (2015) Evidentiality and lexicalisation in the Spanish phraseological system: A study of the idiom a fe mía (and its variants). Discourse Studies, 17(2), 241–256. 10.1177/1461445614564525
    https://doi.org/10.1177/1461445614564525 [Google Scholar]
  63. Versley, Y.
    (2010) Discovery of ambiguous and unambiguous discourse connectives via annotation projection. InL. Ahrenberg, J. Tiedemann, & M. Volk (Eds.), Proceedings of Workshop on Annotation and Exploitation of Parallel Corpora (AEPC) (pp.83–92). Tartu, Estonia. https://versley.de/aepc10.pdf
    [Google Scholar]
  64. Zhou, L., Gao, W., Li, B., Wei, Z., & Wong, K.-F.
    (2012) Cross-lingual identification of ambiguous discourse connectives for resource-poor language. InM. Kay & C. Boitet (Eds.), Proceedings of COLING 2012: Posters (pp.1409–1418). The COLING 2012 Organizing Committee. https://aclanthology.org/C12-2138
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): clustering; discourse markers; inductive methods; parallel corpus; Spanish
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error