1887
image of Reproducibility, replicability, and robustness in corpus linguistics
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This introduction to the special issue calls for more transparent and robust research practices in the field. It situates the discussion within the broader replication crisis in the life and social sciences and explores its relevance for corpus linguistics. The article identifies key areas for improvement — data management, workflows, and reporting — and showcases tools and principles such as FAIR/CARE, version control, reproducible notebooks, and open repositories. It highlights how corpus linguistics can build on open science infrastructures to enhance methodological rigor. Practical challenges, including data sensitivity and skill gaps, are addressed with actionable strategies. The issue brings together contributions that clarify core terminology, test the robustness of established methods, and suggest concrete ways forward. Together, these articles offer conceptual and practical guidance for making corpus linguistic research more open, verifiable, and aligned with broader scientific standards.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.25081.sch
2025-06-12
2025-07-19
Loading full text...

Full text loading...

References

  1. Anthony, L.
    (2024) AntConc (Version 4.3.1) [Computer software]. Waseda University. https://www.laurenceanthony.net/software
    [Google Scholar]
  2. Baker, M.
    (2016) 1,500 scientists lift the lid on reproducibility. Nature, , –. 10.1038/533452a
    https://doi.org/10.1038/533452a [Google Scholar]
  3. Bednarek, M., Schweinberger, M., & Lee, K. K. H.
    (2024) Corpus-based discourse analysis: From meta-reflection to accountability. Corpus Linguistics and Linguistic Theory, (), –. 10.1515/cllt‑2023‑0104
    https://doi.org/10.1515/cllt-2023-0104 [Google Scholar]
  4. Berez-Kroeker, A., Gawne, L., Kung, S. S., Kelly, B., Heston, T., Holton, G., Pulsifer, P., Beaver, D., Chelliah, S., Dubinsky, S., Meier, R., Thieberger, N., Rice, K., & Woodbury, A.
    (2018) Reproducible research in linguistics: A position statement on data citation and attribution in our field. Linguistics, (), –. 10.1515/ling‑2017‑0032
    https://doi.org/10.1515/ling-2017-0032 [Google Scholar]
  5. Blischak, J. D., Davenport, E. R., & Wilson, G.
    (2016) A quick introduction to version control with Git and GitHub. PLOS Computational Biology, (), Article e1004668. 10.1371/journal.pcbi.1004668
    https://doi.org/10.1371/journal.pcbi.1004668 [Google Scholar]
  6. Boersma, P., & van Heuven, V.
    (2001) Speak and unSpeak with PRAAT. Glot International, (), –. https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf
    [Google Scholar]
  7. Bolibaugh, C., Vanek, N., & Marsden, E. J.
    (2021) Towards a credibility revolution in bilingualism research: Open data and materials as stepping stones to more reproducible and replicable research. Bilingualism: Language and Cognition, (), –. 10.1017/S1366728921000535
    https://doi.org/10.1017/S1366728921000535 [Google Scholar]
  8. Bollen, K., Cacioppo, J. T., Kaplan, R. M., Krosnick, J. A., & Olds, J. L.
    (2015) Social, behavioral, and economic sciences perspectives on robust and reliable science (Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences). National Science Foundation. https://rcra.emory.edu/_includes/documents/sections/oric/8-social-behavioral-and-economic-sciences-perspectives-on-robust-and-reliable-science.pdf
    [Google Scholar]
  9. Bühl, A.
    (2018) SPSS: Einführung in die moderne Datenanalyse ab SPSS 25 (16. Aufl). [SPSS: Introduction to modern data analysis from SPSS 25 onwards (16th ed).]. Pearson.
    [Google Scholar]
  10. Calamai, S., & Frontini, F.
    (2018) FAIR data principles and their application to speech and oral archives. Journal of New Music Research, (), –. 10.1080/09298215.2018.1473449
    https://doi.org/10.1080/09298215.2018.1473449 [Google Scholar]
  11. Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Rodrigo, S., Walker, J. D., Anderson, J., & Hudson, M.
    (2020) The CARE principles for Indigenous data governance. Data Science Journal, (), –. 10.5334/dsj‑2020‑043
    https://doi.org/10.5334/dsj-2020-043 [Google Scholar]
  12. Gries, S. T.
    (2022) Toward more careful corpus statistics: Uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics, (), Article 100002. 10.1016/j.rmal.2021.100002
    https://doi.org/10.1016/j.rmal.2021.100002 [Google Scholar]
  13. Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., & Jupyter development team
    (2016) Jupyter Notebooks — a publishing format for reproducible computational workflows. InF. Loizides & B. Schmidt (Eds.), Positioning and power in academic publishing: Players, agents, and agendas. Proceedings of the 20th international conference on electronic publishing (pp.–). IOS Press. 10.3233/978‑1‑61499‑649‑1‑87
    https://doi.org/10.3233/978-1-61499-649-1-87 [Google Scholar]
  14. Marques, J. F., & Bernardino, J.
    (2020) Analysis of data anonymization techniques. InD. Aveiro, J. Dietz, & J. Filipe. Proceedings of the 12th international joint conference on knowledge discovery, knowledge engineering and knowledge management IC3K: Vol. 2. KEOD (pp.–). SciTePress. 10.5220/0010142302350241
    https://doi.org/10.5220/0010142302350241 [Google Scholar]
  15. Open Science Collaboration
    Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science, (). 10.1126/science.aac4716
    https://doi.org/10.1126/science.aac4716 [Google Scholar]
  16. Pedersen, J.
    (2007) Protocols of research and design: Reflections on a participatory design project (sort of) (Doctoral dissertation, Ph. D. thesis. Copenhagen: IT University): Danish Association for Science and Technology Studies. https://www.dasts.dk/wp-content/uploads/2007/11/protocols-of-resarch-and-design.pdf
  17. Perkel, J. M.
    (2019) 11 ways to avoid a data-storage disaster. Nature, , –. 10.1038/d41586‑019‑01040‑w
    https://doi.org/10.1038/d41586-019-01040-w [Google Scholar]
  18. Porte, G.
    (Ed.) (2012) Replication research in applied linguistics. Cambridge University Press.
    [Google Scholar]
  19. Rastle, K.
    (2022) Improving reproducibility in the Journal of Memory and Language. Journal of Memory and Language, , Article 104351. 10.1016/j.jml.2022.104351
    https://doi.org/10.1016/j.jml.2022.104351 [Google Scholar]
  20. Roettger, T. B., Winter, B., & Baayen, H.
    (2019) Emergent data analysis in phonetic sciences: Towards pluralism and reproducibility. Journal of Phonetics, , –. 10.1016/j.wocn.2018.12.001
    https://doi.org/10.1016/j.wocn.2018.12.001 [Google Scholar]
  21. Scott, M.
    (2008) Developing Wordsmith. International Journal of English Studies, (), –. https://revistas.um.es/ijes/article/view/49111
    [Google Scholar]
  22. Sönning, L., & Werner, V.
    (2021) The replication crisis, scientific revolutions, and linguistics. Linguistics, (), –. 10.1515/ling‑2019‑0045
    https://doi.org/10.1515/ling-2019-0045 [Google Scholar]
  23. Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M.
    (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, (). 10.3389/fpsyg.2016.01832
    https://doi.org/10.3389/fpsyg.2016.01832 [Google Scholar]
  24. Winter, B.
    (2019) Statistics for linguists: An introduction using R.Routledge.
    [Google Scholar]
  25. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Bonino da Silva Santos, L., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B.
    (2016) The FAIR guiding principles for scientific data management and stewardship. Scientific Data, , Article 160018. 10.1038/sdata.2016.18
    https://doi.org/10.1038/sdata.2016.18 [Google Scholar]
  26. Wollschläger, D.
    (2021) R kompact: Der schnelle Einstieg in die Datenanalyse. Springer Spektrum. 10.1007/978‑3‑662‑63075‑4
    https://doi.org/10.1007/978-3-662-63075-4 [Google Scholar]
/content/journals/10.1075/ijcl.25081.sch
Loading
  • Article Type: Editorial
Keywords: transparency ; robustness ; replicability ; accountability ; reproducibility
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error