python's scipy.stats.ranksums vs. R's wilcox.test python's scipy.stats.ranksums vs. R's wilcox.test r r

python's scipy.stats.ranksums vs. R's wilcox.test


It depends on the choice of options (exact vs a normal approximation, with or without continuity correction):

R's default:

By default (if ‘exact’ is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.

Default (as shown above):

wilcox.test(x, y)    Wilcoxon rank sum testdata:  x and y W = 182, p-value = 9.971e-08alternative hypothesis: true location shift is not equal to 0 

Normal approximation with continuity correction:

> wilcox.test(x, y, exact=FALSE, correct=TRUE)    Wilcoxon rank sum test with continuity correctiondata:  x and y W = 182, p-value = 1.125e-05alternative hypothesis: true location shift is not equal to 0 

Normal approximation without continuity correction:

> (w0 <- wilcox.test(x, y, exact=FALSE, correct=FALSE))    Wilcoxon rank sum testdata:  x and y W = 182, p-value = 1.006e-05alternative hypothesis: true location shift is not equal to 0 

For a little more precision:

w0$p.value[1] 1.005997e-05

It looks like the other value Python is giving you (4.415880433163923) is the Z-score:

2*pnorm(4.415880433163923,lower.tail=FALSE)[1] 1.005997e-05

I can appreciate wanting to know what's going on, but I would also point out that there is rarely any practical difference between p=1e-7 and p=1e-5 ...