Estimation and inference for AR-1 spike problem using an L0 penalty

spike_inference(
  dat,
  decay_rate,
  tuning_parameter,
  window_size,
  sig2 = NULL,
  return_conditioning_sets = FALSE,
  return_ci = FALSE,
  two_sided = FALSE,
  alpha = 0.05
)

Arguments

dat

Numeric vector; observed data.

decay_rate

Numeric; specified AR-1 decay rate \(\gamma\), a number between 0 and 1 (non-inclusive).

tuning_parameter

Numeric; tuning parameter \(\lambda\) for L0 spike estimation, a non-negative number.

window_size

Numeric; window size for fixed window hypothesis testing, a non-negative integer.

sig2

Numeric; noise variance for the observed data, a non-negative number. If unknown (NULL), sample variance of residuals is used instead.

return_conditioning_sets

Logical; Should the conditioning set S be returned?

return_ci

Logical; if TRUE, the confidence interval for the change in calcium is computed and returned.

two_sided

Logical; if TRUE, a 2-sided p-value is computed and returned.

alpha

Numeric; significance level for the hypothesis test, a number between 0 and 1 (non-inclusive).

Value

Returns a list with elements:

  • spikes the set of spikes,

  • pvals p-values associated with each spike,

  • LCB lower confidence band for each spike,

  • UCB upper confidence band for each spike.

Details

Consider the AR-1 generative model $$Y_t = c_t + \epsilon_t, \epsilon_t \sim N(0, \sigma^2),$$ where \(c_t = \gamma c_{t-1} + z_t\) and \(z_t \sim Poisson(poisMean)\). In words, this says between spikes (when \(z_t=0\)), calcium decays exponentially at a known rate \(\gamma\in(0,1)\), which is taken to be known. Further denote the locations of true spikes, \(\{t:z_t\geq 0\}\) as \(\{0 = \tau_0 < \tau_1 < \ldots < \tau_K < \tau_{K+1} = T\}\).

This function first estimates spikes via L0 penalty based on noisy observations \(y_t, t = 1, \ldots, T\) by solving the following optimization problem $$ minimize_{c_1,...,c_T\geq 0} \frac{1}{2} \sum_{t=1}^{T} ( y_t - c_t )^2 + \lambda \sum_{t=2}^{T} 1(c_t \neq \gamma c_t-1).$$ Estimated spikes correspond to the time t such that estimated calcium does not decay exponentially, i.e., \(\{\cdots,\hat{\tau}_j,\cdots\} = \{t: \hat{c}_{t+1} -\gamma c_{t} \neq 0\} \).

Now suppose that we want to test whether the calcium is exponentially decaying near an estimated spike \(\hat{\tau}_j\); or equivalently, the null hypothesis of the form \(H_{0}: \nu^T c = 0\) versus \(H_{1}: \nu^T c> 0\) for suitably chosen \(\nu\) (see Section 2 in Chen et al. (2021+) for details). This function computes the following p-value $$P(\nu^T Y \geq \nu^T y | \hat{\tau}_j \in M(Y), \nu^T Y> 0, \Pi_\nu^\perp Y = \Pi_\nu^\perp y)$$ where \(M(Y)\) is the set of spikes estimated from Y via the L0 method, \(\Pi_\nu^\perp\) is the orthogonal projection to the orthogonal complement of \(\nu\). In particular, this p-value controls the selective Type I error (see Section 3 in Chen et al. (2021+) for details).

In addition, we implement a \(1-\alpha\) confidence interval for the parameter \( \nu^T c\), the increase in calcium associated with an estimated spike \(\hat{\tau}_j\) (see Section 4 in Chen et al. (2021+) for details).

References

Chen YT, Jewell SW, Witten DM. (2021+) Quantifying uncertainty in spikes estimated from calcium imaging data. arXiv:2103.0781 [statME].

Jewell, S. W., Hocking, T. D., Fearnhead, P., & Witten, D. M. (2019). Fast nonconvex deconvolution of calcium imaging data. Biostatistics.

Jewell, S., Fearnhead, P., and Witten, D. (2019+). Testing for a change in mean after changepoint detection. Technical report.

Maidstone, R., Hocking, T., Rigaill, G., & Fearnhead, P. (2017). On optimal multiple changepoint algorithms for large data. Statistics and Computing, 27(2), 519-533.

Rigaill, G. (2015). A pruned dynamic programming algorithm to recover the best segmentations with 1 to K_max change-points. Journal de la Societe Francaise de Statistique, 156(4), 180-205.

Examples

gam <- 0.98 LAMBDA <- 0.7 sigma <- 0.3 n_length <- 1000 curr_sim <- simulate_ar1(n = n_length, gam = gam, poisMean = 0.01, sd = sigma, seed = 2) curr_fit_spike <- spike_estimates(dat = curr_sim$fl, decay_rate = gam, tuning_parameter = LAMBDA) curr_inference_spike <- spike_inference(dat = curr_sim$fl, decay_rate = gam, tuning_parameter = LAMBDA, window_size = 2, sig2 = sigma*sigma, return_ci = TRUE) ### a summary of the inferential results summary(curr_inference_spike)
#> estimated_spikes pvals LCB UCB #> 1 95 0.0026865876 0.3313205 1.5624120 #> 2 237 0.3080037641 -0.9602564 0.9293162 #> 3 369 0.0003211625 0.5423357 1.7345743 #> 4 463 0.0489093674 -0.1322414 1.1940410 #> 5 503 0.0086375856 0.1706250 1.4210234 #> 6 515 0.0007364613 0.4639493 1.6659810 #> 7 538 0.7342421299 -8.5600416 1.0333508 #> 8 561 0.1274106669 -0.5842248 1.2708549 #> 9 571 0.0011170360 0.4546186 1.6866898 #> 10 610 0.0370503683 -0.1092493 1.4964073 #> 11 627 0.0020417663 0.3977844 1.6573791 #> 12 662 0.0046657711 0.2894321 1.5793793 #> 13 781 0.0059471283 0.2203090 1.4564512 #> 14 914 0.0142668035 0.1420499 1.6782515 #> 15 924 0.6120957544 -4.3107861 1.0106799 #> 16 946 0.0011172539 0.4747208 1.7228907