1887
image of The New Statistics for applied linguistics

Abstract

Abstract

The New Statistics is an approach to scholarly research which offers an alternative to the problematic overreliance on significance testing currently plaguing the research literature. This paper describes the problems associated with significance testing and introduces the key concepts of the data-analysis that best fits with the goals of the New Statistics: estimation of effect sizes and confidence intervals. These concepts will be applied in a reanalysis of the summary data from an article that was recently published in this journal. This makes it possible to compare the estimation approach advocated by the New Statistics to the standard significance tests and to discuss potential drawbacks of this approach as a means of gathering quantitative evidence in support of our substantive hypotheses.

Available under the CC BY 4.0 license.
Loading

Article metrics loading...

/content/journals/10.1075/dujal.19019.mul
2020-09-10
2020-09-28
Loading full text...

Full text loading...

/deliver/fulltext/10.1075/dujal.19019.mul/dujal.19019.mul.html?itemId=/content/journals/10.1075/dujal.19019.mul&mimeType=html&fmt=ahah

References

  1. American Psychological Association
    American Psychological Association (2010) Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.
    [Google Scholar]
  2. Amrhein, V., Greenland, S., & McShane, B.
    (2019) Retire statistical significance. Nature, 567, 305–307. 10.1038/d41586‑019‑00857‑9
    https://doi.org/10.1038/d41586-019-00857-9 [Google Scholar]
  3. Bakan, D.
    (1966) The test of significance in psychological research. Psychological Bulletin, 66, 423–437. 10.1037/h0020412
    https://doi.org/10.1037/h0020412 [Google Scholar]
  4. Berger, O. J., & Sellke, T.
    (1987) Testing a point null hypothesis: The irreconcilability of P Values and evidence. Journal of the American Statistical Association, 82, 112–122.
    [Google Scholar]
  5. Berkson, J.
    (1942) Tests of significance considered as evidence. Journal of the American Statistical Association, 37, 325–335. 10.1080/01621459.1942.10501760
    https://doi.org/10.1080/01621459.1942.10501760 [Google Scholar]
  6. Calin-Jageman, R. J., & Cumming, G.
    (2019a) The New Statistics for better science: Ask how much, how uncertain, and what else is known. The American Statistician, 70, 271–280. 10.1080/00031305.2018.1518266
    https://doi.org/10.1080/00031305.2018.1518266 [Google Scholar]
  7. (2019b) Estimation for better inference in neuroscience. ENeuro, 6, 1–11. 10.1523/ENEURO.0205‑19.2019
    https://doi.org/10.1523/ENEURO.0205-19.2019 [Google Scholar]
  8. Carver, R. P.
    (1978) The case against significance testing. Harvard Educational Review, 48, 378–399. 10.17763/haer.48.3.t490261645281841
    https://doi.org/10.17763/haer.48.3.t490261645281841 [Google Scholar]
  9. Chambers, C.
    (2018) The seven deadly sins of psychology. A manifesto for reforming the culture of scientific practice. Princeton/Oxford: Princeton University Press.
    [Google Scholar]
  10. Cohen, J.
    (1988) Statistical power analysis for the behavioral sciences (2nd edition). New York, NY: Academic Press.
    [Google Scholar]
  11. (1994) The earth is round (p < .05). American Psychologist, 49, 997–1003. 10.1037/0003‑066X.49.12.997
    https://doi.org/10.1037/0003-066X.49.12.997 [Google Scholar]
  12. Cumming, G.
    (2012) Understanding the New Statistics. Effect sizes, confidence intervals, and meta-analysis. New York/London: Routledge.
    [Google Scholar]
  13. (2014) The New Statistics: Why and how. Psychological Science, 25, 7–29. 10.1177/0956797613504966
    https://doi.org/10.1177/0956797613504966 [Google Scholar]
  14. Cumming, G. & Calin-Jageman, R. J.
    (2017) Introduction to the New Statistics. Estimation, open science, & beyond. New York/London: Routledge.
    [Google Scholar]
  15. Falk, R., & Greenbaum, C. W.
    (1995) Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5, 75–98. 10.1177/0959354395051004
    https://doi.org/10.1177/0959354395051004 [Google Scholar]
  16. Field, A.
    (2015) Discovering statistics using IBM SPSS Statistics (4th ed.). London: Sage.
    [Google Scholar]
  17. Gigerenzer, G.
    (2004) Mindless statistics. Journal of Socio-Economics, 33, 587–606. 10.1016/j.socec.2004.09.033
    https://doi.org/10.1016/j.socec.2004.09.033 [Google Scholar]
  18. Gigerenzer, G., & Marewski, J. N.
    (2015) Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 41, 421–440. 10.1177/0149206314547522
    https://doi.org/10.1177/0149206314547522 [Google Scholar]
  19. Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L.
    (1989) The Empire of Chance. Cambridge: Cambridge University Press. 10.1017/CBO9780511720482
    https://doi.org/10.1017/CBO9780511720482 [Google Scholar]
  20. Hacking, I.
    (1965) Logic of statistical inference. Cambridge: Cambridge University Press. 10.1017/CBO9781316534960
    https://doi.org/10.1017/CBO9781316534960 [Google Scholar]
  21. Haller, H., & Krauss, S.
    (2002) Misinterpretations of significance. A problem students share with their teachers?Methods of Psychological Research Online, 7, www.mpr-online.de
    [Google Scholar]
  22. Hubbard, R.
    (2004) Alfabet soup: Blurring the distinctions between p’s and α’s in psychological research. Theory &Psychology, 14, 295–327. 10.1177/0959354304043638
    https://doi.org/10.1177/0959354304043638 [Google Scholar]
  23. Hubbard, R., & Lindsay, R. M.
    (2008) Why P values are not a useful measure of evidence in statistical significance testing. Theory & Psychology, 18, 69–88. 10.1177/0959354307086923
    https://doi.org/10.1177/0959354307086923 [Google Scholar]
  24. Kelley, K.
    (2007) Methods for the behavioral, educational, and social sciences: An R package. Behavior Research Methods, 39, 979–384. 10.3758/BF03192993
    https://doi.org/10.3758/BF03192993 [Google Scholar]
  25. Kline, R. B.
    (2013) Beyond significance testing. Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association. 10.1037/14136‑000
    https://doi.org/10.1037/14136-000 [Google Scholar]
  26. Kruschke, J. K.
    (2015) Doing Bayesian data analysis. A tutorial with R, Jags, and Stan (2nd ed.). London: Academic Press.
    [Google Scholar]
  27. Kruschke, J. K., & Liddell, T. M.
    (2018) The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin Review, 25, 178–206. 10.3758/s13423‑016‑1221‑4
    https://doi.org/10.3758/s13423-016-1221-4 [Google Scholar]
  28. Lambdin, C.
    (2012) Significance tests as sorcery: Science is empirical – significance tests are not. Theory & Psychology, 22, 67–90. 10.1177/0959354311429854
    https://doi.org/10.1177/0959354311429854 [Google Scholar]
  29. Lindley, D. V.
    (2000) The philosophy of statistics. The Statistician, 49, 293–337.
    [Google Scholar]
  30. Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E. J.
    (2018) Bayesian reanalyses from summary statistics: A guide for academic consumers. Advances in Methods and Practices in Psychological Science, 1, 367–374. 10.1177/2515245918779348
    https://doi.org/10.1177/2515245918779348 [Google Scholar]
  31. Maxwell, S. E., Delaney, H. D., & Kelley, K.
    (2017) Designing experiments and analyzing data. A model comparison perspective (3th ed.). New York, NY: Routledge. 10.4324/9781315642956
    https://doi.org/10.4324/9781315642956 [Google Scholar]
  32. McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L.
    (2019) Abandon statistical significance. The American Statistician, 73, 235–245. 10.1080/00031305.2018.1527253
    https://doi.org/10.1080/00031305.2018.1527253 [Google Scholar]
  33. Meehl, P. E.
    (1978) Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. 10.1037/0022‑006X.46.4.806
    https://doi.org/10.1037/0022-006X.46.4.806 [Google Scholar]
  34. (1997) The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. InL. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp.393–425). Mahwah, NJ: Erlbaum.
    [Google Scholar]
  35. Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J.
    (2016) The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin Review, 23, 103–123. 10.3758/s13423‑015‑0947‑8
    https://doi.org/10.3758/s13423-015-0947-8 [Google Scholar]
  36. Mulder, G.
    (2016) De kwaliteit van onderzoek. Dichotoom denken versus meta-analytisch denken. Tijdschrift voor Taalbeheersing, 38, 163–173. 10.5117/TVT2016.2.MULD
    https://doi.org/10.5117/TVT2016.2.MULD [Google Scholar]
  37. (2019) Een significant probleem. Tijdschrift voor Taalbeheersing, 41, 203–213. 10.5117/TVT2019.1.014.MULD
    https://doi.org/10.5117/TVT2019.1.014.MULD [Google Scholar]
  38. Neyman, J.
    (1977) Frequentist probability and frequentist statistics. Synthese, 36, 97–131. 10.1007/BF00485695
    https://doi.org/10.1007/BF00485695 [Google Scholar]
  39. Nickerson, R. S.
    (2000) Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301. 10.1037/1082‑989X.5.2.241
    https://doi.org/10.1037/1082-989X.5.2.241 [Google Scholar]
  40. Norouzian, R., De Miranda, M., & Plonksy, L.
    (2018) The Bayesian revolution in second language research: An applied approach. Language Learning, 68, 1032–1075. 10.1111/lang.12310
    https://doi.org/10.1111/lang.12310 [Google Scholar]
  41. Oakes, M.
    (1986) Statistical significance. New York, NY: Wiley.
    [Google Scholar]
  42. Perezgonzalez, J. D.
    (2015) Fisher, Neyman-Pearson, or NHST? A tutorial for teaching data testing. Frontiers in Psychology, 6, 1–11. 10.3389/fpsyg.2015.00223
    https://doi.org/10.3389/fpsyg.2015.00223 [Google Scholar]
  43. Polya, G.
    (1954) Mathematics and plausible reasoning, V1–2. Induction and analogy in mathematics, patterns of plausible inference. Princeton, NJ: Princeton University Press.
    [Google Scholar]
  44. Roozeboom, W. W.
    (1960) The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57, 416–428. 10.1037/h0042040
    https://doi.org/10.1037/h0042040 [Google Scholar]
  45. Rosenthal, R., Rosnow, R. L., & Rubin, D. B.
    (2000) Contrasts and effect sizes in behavioral research. A correlational approach. Cambridge, UK: Cambridge University Press.
    [Google Scholar]
  46. Rouder, J. N., Speckman, P. L., Dongchu, S., Morey, R. D., & Iversen, G.
    (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review, 16, 225–237. 10.3758/PBR.16.2.225
    https://doi.org/10.3758/PBR.16.2.225 [Google Scholar]
  47. Schmidt, F. L.
    (1996) Statistical significance and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115–129. 10.1037/1082‑989X.1.2.115
    https://doi.org/10.1037/1082-989X.1.2.115 [Google Scholar]
  48. Van Hilten, M., & Van Vuuren, S.
    (2017) Does it ‘feel’ non-native? Native-speaker perceptions of information-structural transfer in L1 Dutch advanced EFL writing. Dutch Journal of Applied Linguistics, 6, 197–212. 10.1075/dujal.16021.hil
    https://doi.org/10.1075/dujal.16021.hil [Google Scholar]
  49. Wasserstein, R. L., & Lazar, N.
    (2016) The ASA’s statement on P-values: Context, process, and purpose. The American Statistician, 70, 129–133. 10.1080/00031305.2016.1154108
    https://doi.org/10.1080/00031305.2016.1154108 [Google Scholar]
  50. Wasserstein, R. L., Schirm, A. L., & Lazar, N. A.
    (2019) Moving to a world beyond “p < .05”. The American Statistician, 73, 1–19. 10.1080/00031305.2019.1583913
    https://doi.org/10.1080/00031305.2019.1583913 [Google Scholar]
  51. Wiens, S., & Nilsson, M. E.
    (2017) Performing contrast analysis in factorial designs: From NHST to confidence intervals and beyond. Educational and Psychological Measurement, 77, 690–715. 10.1177/0013164416668950
    https://doi.org/10.1177/0013164416668950 [Google Scholar]
  52. Zilliak, S. T., & McClosky, D. N.
    (2008) The cult of statistical significance. How the standard error costs us jobs, justice, and lives. Ann Arbor, MN: The University of Michigan Press.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/dujal.19019.mul
Loading
/content/journals/10.1075/dujal.19019.mul
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error