Skip to contents

Mean estimator and PPI++

For labeled size nn and unlabeled size NN:

Estimator θ̂λ=YL+λ(fUfL). \hat\theta_\lambda = \bar{Y}_L + \lambda(\bar{f}_U - \bar{f}_L).

Variance Var(θ̂λ)=σY2n+λ2σf2(1N+1n)2λCov(Y,f)n. \operatorname{Var}(\hat\theta_\lambda) = \frac{\sigma_Y^2}{n} + \lambda^2 \sigma_f^2 \left(\frac{1}{N} + \frac{1}{n}\right) - \frac{2\lambda\,\operatorname{Cov}(Y,f)}{n}.

Minimizing yields λ*=Cov(Y,f)(1+n/N)σf2,Var(θ̂λ*)=σY2nCov(Y,f)2σf2Nn(n+N). \lambda^* = \frac{\operatorname{Cov}(Y,f)}{(1 + n/N)\sigma_f^2}, \qquad \operatorname{Var}(\hat\theta_{\lambda^*}) = \frac{\sigma_Y^2}{n} - \frac{\operatorname{Cov}(Y,f)^2}{\sigma_f^2}\frac{N}{n(n+N)}.

lambda_opt <- function(n, N, cov_y_f, var_f) cov_y_f / ((1 + n / N) * var_f)
lambda_opt(n = 200, N = 5000, cov_y_f = 0.6, var_f = 0.6)
#> [1] 0.9615385

Power and required nn

Given effect size Δ\Delta, target power 1β1-\beta, and level α\alpha, |Δ|Var(θ̂λ*)z1α/2+z1β. \frac{\lvert\Delta\rvert}{\sqrt{\operatorname{Var}(\hat\theta_{\lambda^*})}} \ge z_{1-\alpha/2} + z_{1-\beta}. power_ppi_mean() solves this directly; use it instead of hand algebra:

power_ppi_mean(
  delta      = 0.2,
  N          = 5000,
  n          = NULL,
  power      = 0.9,
  sigma_y2   = 1.0,
  sigma_f2   = 0.6,
  cov_y_f    = 0.6,
  lambda_mode = "oracle"
)
#> [1] 109

Metrics-to-moments mapping (binary)

For sensitivity/specificity (sens,spec)(\text{sens}, \text{spec}) and prevalence pp, pŶ=sensp+(1spec)(1p),Cov(Y,f)=p(1p)(sens+spec1). p_{\hat{Y}} = \text{sens}\,p + (1-\text{spec})(1-p), \qquad \operatorname{Cov}(Y,f) = p(1-p)(\text{sens} + \text{spec} - 1). The helper binary_moments_from_sens_spec() returns Var(Y)\operatorname{Var}(Y), Var(f)\operatorname{Var}(f), and Cov(Y,f)\operatorname{Cov}(Y,f):

binary_moments_from_sens_spec(p = 0.3, sens = 0.85, spec = 0.85)
#> $sigma_y2
#> [1] 0.21
#> 
#> $sigma_f2
#> [1] 0.2304
#> 
#> $cov_y_f
#> [1] 0.147
#> 
#> $p_hat
#> [1] 0.36

Use these outputs directly in power_ppi_mean().