Glenn Shafer, Rutgers Business School
Calibrate p-values by taking the square root

For nearly 100 years, researchers have persisted in using p-values in spite of fierce criticism. Both Bayesians and Neyman-Pearson purists contend that use of a p-value is cheating even in the simplest case, where the hypothesis to be tested and a test statistic are specified in advance. Bayesians point out that a small p-value often does not translate into a strong Bayes factor against the hypothesis. Neyman-Pearson purists insist that you should state a significance level in advance and stick with it, even if the p-value turns out to be much smaller than this significance level. But many applied statisticians persist in feeling that a p-value much smaller than the significance level is meaningful evidence.

In the game-theoretic approach to probability (see my 2001 book with Vladimir Vovk, described at www.probabilityandfinance.com), you test a statistical hypothesis by using its probabilities to bet. You reject at a significance level of 0.01, say, if you succeed in multiplying the capital you risk by 100. In this picture, we can calibrate small p-values so as to measure their meaningfulness while absolving them of cheating. There are various ways to implement this calibration, but one of them leads to a very simple rule of thumb: take the square root of the p-value. Thus rejection at a significance level of 0.01 requires a p-value of one in 10,000.