Table of contents · 10
A statistically significant finding can be practically meaningless. A statistically non-significant finding can be enormously consequential. Confusing the two — which is endemic in published work — is the most common analytical error we see across Research Goal's methodology cohort. This guide untangles the two and gives you a checklist for evaluating your own results.
Two different questions#
Statistical significance answers one question: given the noise in my data, how confident am I that this effect isn't zero? Practical significance answers a different question: if the effect is real, is it large enough to matter? These are separate. A result can be one without the other — and routinely is.
The sample-size trap#
With a large enough sample, almost any effect becomes statistically significant. A correlation of 0.05 between two variables — practically invisible — will achieve p < 0.001 once n exceeds ~3,000. The p-value is sensitive to sample size; the effect size is not. If you only report the p-value, you've told the reader almost nothing about whether the finding matters.
Effect sizes that matter#
Every quantitative paper should report an effect size alongside every p-value. The right effect size depends on your design, but a working researcher should be fluent with these:
Cohen's d — for mean differences#
Difference between two means in standard-deviation units. Rough convention: 0.2 small, 0.5 medium, 0.8 large. "Rough" is doing a lot of work — context-of-field matters. A d of 0.3 in education research can be policy-relevant; a d of 0.3 in physics is noise.
R² — for explained variance#
Proportion of variance in the outcome explained by the predictor(s). An R² of 0.05 means 95% of what's happening in your outcome is not explained by your model. That can still be meaningful — but it's a different conversation than "my predictor explains everything."
Odds ratios — for binary outcomes#
Reported routinely in health and social science. An OR of 1.05 with p < 0.001 in a 50,000-person sample is statistically significant and practically tiny. An OR of 3.2 with p = 0.08 in a 50-person sample is statistically non-significant and worth a follow-up study.
If the only number in your results sentence is a p-value, you have not yet reported your finding — you've reported your noise level.
Confidence intervals — the one number to report#
If you can only report one statistic for a finding, report the 95% confidence interval — not the p-value. A CI tells the reader the plausible range of the effect, which subsumes the question of significance (does it cross zero?) AND of magnitude (how wide is the range?). Increasingly, journals are requiring CIs as the primary statistic. Lead with them.
A checklist for any significant result#
- Have I reported the effect size, not just the p-value?
- Is the effect size big enough to matter in my field's terms — not just by Cohen's conventions?
- What is the 95% confidence interval, and how wide is it relative to the point estimate?
- Could this effect be an artefact of sample size, multiple testing, or model misspecification?
- Would I make a different decision if the effect were half as large? If yes, the practical-significance bar isn't met yet.
When non-significant findings still matter#
A non-significant result with a large effect size and a wide confidence interval is usually a power problem, not a null result. Don't bury it — report the effect size and CI, name the sample-size limitation, and propose the follow-up study with the right power. "We found no effect" and "we were underpowered to detect a meaningful effect" are very different claims; reviewers respect the latter.
Wrapping up#
P-values answer one narrow question about your data. Effect sizes and confidence intervals answer the question your reader actually has — does this matter? Report both, lead with the CI, and call out when sample size is doing the heavy lifting. That's the difference between statistically significant and practically significant.
Comments1
One comment on this article.Leave a comment
Reviewed before going live. Repeat commenters auto-approved.James Okonkwo
I now refuse to read a results section that doesn't lead with effect sizes. The p-only papers age the worst.