In Null Hypothesis Significance Tests, the p-value is the probability of observing an effect larger than or equal to the measured metric delta, under the assumption that the null hypothesis is true. In practice, a p-value that’s lower than your pre-defined Type I Error threshold (α) is treated as evidence for there being a true effect.The methodology used for p-value calculation depends on the number of degrees of freedom (ν). A two-sample z-test is appropriate for most experiments. Welch’s t-test is used for smaller experiments with ν<100. In both cases, the p-value depends on the metric mean and variance computed for the test and control groups.Typically, a p-value that indicates statistical significance (below the pre-determined threshold α) could only occur with a confidence interval that does not cross 0. However, this phenomenon can occur in the Statsig UI, due to cases when the p-value of the difference between test and control is statsig, but due to uncertainty in the control, a relative delta confidence interval may cross zero (using The Delta Method) or be represented as a point estimate (using Fieller Intervals ) while the absolute difference’s p-value is statistically significant.
The z-statistic (a.k.a. z-score) of a two-sample z-test can be computed in multiple equivalent formats:Z=var(Xt)+var(Xc)Xt−Xc=var(ΔX)Xt−Xc=σXt2+σXc2Xt−Xcwhere:
Z is the observed z-statistic (not the z-critical value Zα/s)
var(ΔX) is the variance of the absolute delta of means
var(Xi) is the variance of sample means either control or treatment group (details here)
σXt is the standard error of the mean of either control or treatment group (these are the terms you can find in Pulse under the Statistics tab of a metric)
The two-sided p-value is obtained from the standard normal cumulative distribution function:p−value=2⋅2π1−∞∫−∣Z∣e−t2/2dt
For smaller sample sizes, Welch’s t-test is the preferred statistical test for lower false positive rates in cases of unequal sizes and variances. In Pulse, Welch’s t-test is automatically applied when the degrees of freedom ν<100.We compute the t-statistic (a.k.a. t-score) identically as the two-sample z-statistic above. Additionally, we compute the degrees of freedom ν using:ν=Nt−1var(Xt)2+Nc−1var(Xc)2(var(Xt)+var(Xc))2:=Nt−1var(Xt)2+Nc−1var(Xc)2var(ΔX)2The p-value is then obtained from the t-distribution with ν degrees of freedom.
The procedure for a one-sided z-test computes the z-statistic Z in the same way as a two-sided test above.The one-sided p-value is obtained from the standard normal cumulative distribution function as well, but with slight differences:p−value=⎩⎨⎧1−2π1−∞∫Ze−t2/2dt2π1−∞∫Ze−t2/2dtif right-hand testif left-hand testwhere:
Z is computed above in the two-sided test. Note that this uses the signed z-statistic, not the absolute value of the z-statistic as in the two-sided p-value.