Tag Archives: statistical testing

The Argument Against P-Values

There is concern that a substantial proportion of published research presents largely false findings (Ioannidis, 2005). This problem, in part, stems from social science’s reliance on null hypothesis statistical testing (NHST) given the incentive to achieve statistical significance (e.g., publications, grant funding). Research in the social sciences has historically adopted a Frequentist perspective, primarily reporting results using a dichotomous reject or non-reject decision strategy based on whether some test statistic surpasses a critical value and results in a statistically significant p-value (usually p > 0.05). Although useful in several ways, p-values are largely arbitrary metrics of statistical significance (Greenland et al., 2016), and they are often used incorrectly (Gelman, 2016). The use of p-values encourages a binary mindset when analyzing effects as either null or real, however, this binary outlook provides no information on the magnitude or precision of the effect. P-values can vary dramatically based on the population effect size and the sample size (Cumming, 2008). This reliance on an unstable statistical foundation has been discussed in the literature (Wasserstein, 2016), and while some journals have taken matters into their own hands (for example, Basic and Applied Social Psychology banned p-values and NHST), the field of psychology has largely failed to address the concerns raised by the use of NHST.                                     

Research is moving towards adopting new statistics as best practice, relying instead on estimations based on effect sizes, confidence intervals, and meta-analysis (Cumming, 2014). We, as graduate students in training, are in a position to push towards thinking in terms of estimations and away from dichotomously constrained interpretations. In contrast to the binary nature of p-values, a confidence interval is a set of plausible values for the point estimate. Although perhaps wide, the confidence interval accurately conveys the magnitude of uncertainty of the point estimate (Cumming, 2014), as well as the level of confidence in our results. For example, a 95% confidence interval that includes values for a population mean, μ, indicates 95% confidence that the lower and upper limits are likely lower and upper bounds for μ. The APA Publication Manual (APA, 2020) specifically outlines recommendations to report results based on effect size estimates and confidence intervals, rather than p-values. P-values are not well suited to drive our field forward in terms of precision and magnitude of estimates. Researchers should therefore focus on advancing the field by gaining an understanding of what the data can tell us about the magnitude of effects and the practical significance of those results. It is important for graduate students to adopt practices to produce reproducible and reliable research. One way to do so is to move beyond p-values.

How to move beyond p-values:

  • Prioritize estimation instead of null hypothesis testing or p-values
    • Formulate research questions in terms of estimation. Ex: How large is the effect of X on Y; to what extent does X impact Y?
  • Report confidence intervals and corresponding effect sizes
  • Include confidence intervals in figures (preferred over standard error bars)
  • Make interpretations and conclusions based on the magnitude of the effects rather than a dichotomous decision based on “statistical significance”

References                                                     

American Psychological Association. (2020). Publication manual of the American Psychological Association 2020: the official guide to APA style (7th ed.). American Psychological Association.

Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286– 300. doi:10.1111/j.1745-6924.2008.00079.x     

Cumming, G. (2014). The new statistics: why and how. Psychological science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966

Gelman, A. (2016). The problems with p-values are not just with p-values. The American Statistician, 70(10).

Greenland, S., Senn, S.J., Rothman, K.J. et al. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31, 337–350. https://doi.org/10.1007/s10654-016-0149-3                                          

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, e124. Retrieved from http:// www.plosmedicine.org/article/info:doi/10.1371/journal .pmed.0020124

Wasserstein, R.L., & Lazar, N.A. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108

Written by Marianne Chirica, an APAGS Science Committee member and a third-year graduate student in the Psychological and Brain Sciences Ph.D. program at Indiana University. Feel free to reach out to Marianne with any questions you may have!