Everyone who has ever participated in a debate on the validity of science, knows that it never takes long before somebody will bring up the topic of p-values. Although the p-value is still the most widely used index to assess statistical significance, critique has been raised on its usefulness. Numerable papers have pointed out the flaws of this research tool for testing hypotheses and have hackled the strange conclusions it might lead to. But what if the problem is not the p-value itself –which can be a useful statistical measure- but the fact that ‘it is commonly misused and misinterpreted’?
The American Statistical Association (ASA) Statement on p-values (2016) aims to clarify some of the underlying principles of the p-value and gives a clear description of how this index is to be used. Moreover it pays special attention to the importance of contextual factors and the way researchers should take these into account. This report was written for a general scientific audience and it is a real recommendation for all who wish to understand what p-values are but have always been afraid to ask.
The ASA statement is short, a small effort to read. To better understand where things go wrong with Null-hypothesis Significance Testing (NHST), and especially in interpreting the p-value, several publications are available. We name a few.
One of the supplementary materials of the APA statement, is Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. After a critical discussion of basic statistics and explanations why 25 misinterpretations of p-values, confidence intervals and power are wrong, the authors comfortingly provide guidance in what to do instead.
Steven Goodman earlier discussed a dozen p-value misconceptions (2008), including possible consequences of the wrong interpretation. It brought Goodman to the conclusion that we must open our eyes for alternative methodologies, such as the Bayes’ factor.
Pitfalls not only lay in the interpretation of the p-value, but lurk in all phases of, for example, psychological research. With this disciplinary field in mind, in which NHST is widely used, researchers of the department of Methodology and Statistics of Tilburg University created a list of 34 Researcher Degrees of Freedom. This list gives insight in the many decisions –some more explicit or arbitrary than others– that researchers have to make.
Although publications like these are out there, helping researchers to improve their methodology and reporting, and aiming to improve the interpretation (and application) of study results, this message has not made it into the most widely used textbooks yet. Use publications like these articles to educate the young generation of students and equip them with the knowledge to make sound decisions, assessments and interpretations.
One of the main points of critique on using ‘null-hypothesis significance testing’, is that the probability of a null effect in the study population, is inseparable from the likelihood of the null-hypothesis being ‘true’. This is avoided by several alternative methods, however the p-value is still widely used. Why does the majority of researchers still hold on to a method that is under fire while alternatives are available? Researchers of the Vrije Universieity aim to answer this question with their research project ‘The Myth of Null-hypothesis Significance Testing’.
Not only NHST itself and the interpretation of the p-value cause problems. Also the reporting of the p-value can be difficult if it’s not quite what you hoped for. But no need to worry. Here are plenty of examples how colleagues before you tried to make their p>0.05 sound interesting after all. Have fun!
by Isabella Vos and Fenneke Blom
To read more about this topic on our website, use the tag ‘statistical significance’ for related posts.
NRIN devotes a great deal of attention to the website’s content and would greatly appreciate your suggestions of documents or links you believe to belong on this website.
This selection is an incomplete convenience sample, and does not reflect NRIN’s vision or opinion. Including an item does not necessarily mean that we agree with the authors nor does it imply we think unmentioned items are of poorer quality.
Please report any suggestions, inaccuracies or nonfunctioning hyperlinks etc. that you discover. Thanks in advance!Contact