Skip to content

Efficient algorithms for detecting changes in data streams, based on the Focus algorithm. Includes Ex-FOCuS as well as NP-FOCuS.

License

Notifications You must be signed in to change notification settings

gtromano/exfocus.rcpp

Repository files navigation

exfocus.rcpp

Installation

You can install the development version of exfocus.rcpp from source with:

# If you have devtools installed:
devtools::install_github("gtromano/exfocus.rcpp")
# Or, if building locally:
# setwd("path/to/exfocus.rcpp")
# devtools::install()

Or, if you have the package source directory:

install.packages("path/to/exfocus.rcpp", repos = NULL, type = "source")

Available Functions

  • focus_offline(...)
    Run the FOCuS detector in batch/offline mode (all cycles handled in C++).

  • npfocus_offline(...)
    Run the NP-FOCuS detector for multiple quantiles in batch/offline mode.

  • detector_create(family, theta0, args)
    Create a new online detector object for a given family.

  • detector_update(detector, y)
    Update the detector with a new observation.

  • detector_statistic(detector)
    Get the current value of the detector statistic.

  • computeprobsNPPELT(n)
    Compute quantile probabilities for NP-FOCuS.

exfocus Rcpp offline implementation for testing purposes

The first set of examples demonstrates the offline implementation, where all cycles and updates are handled internally in C++ for maximum efficiency. This approach is ideal for benchmarking and testing, as it closely reflects the performance you would expect from a pure C++ implementation.

While an online implementation is also available—allowing you to update the detector sequentially from R—this will be inherently slower, since each update involves a call from R into C++. The online interface is useful for real-time or streaming scenarios, but is not as fast as the fully internal C++ loop.

If you wish to use the library entirely in C++ (for maximum speed or integration into other C++ projects), you can do so by following the patterns in focus_rcpp_wrapper.cpp.

Gaussian

Pre change parameter known

Here we test the pre-change parameter known recursion, for gaussian change in mean.

theta0 <- 20
set.seed(45)
Y <- c(rnorm(1e3, theta0), rnorm(500, theta0 - 1))

res <- focus_offline(Y, 50, family = "gaussian", theta0 = theta0, args = list(), adp_max_check = F)
plot(res$stat, type = "l")

pre-change unknown

To specify the pre-change unkown we have to set the theta0 parameter to NaN.

theta0 <- 20
set.seed(45)
Y <- c(rnorm(1e3, theta0), rnorm(500, theta0 - 1))

res <- focus_offline(Y, 50, family = "gaussian", theta0 = NaN, args = list(), adp_max_check = F)
plot(res$stat, type = "l")

Gamma

known

If you wish to perform exponential change in rate, set the shape argument to 1!

theta0 <- 4
shape <- 4
set.seed(42)
Y <- c(rgamma(500, rate = theta0, shape = shape), rgamma(500, rate = theta0 - 1, shape = shape))

system.time(res <- focus_offline(Y, 50, family = "gamma", theta0 = 1/theta0, args = list(shape = 4), adp_max_check = F))
   user  system elapsed 
  0.001   0.000   0.000 
plot(res$stat, type = "l")

unknown

theta0 <- 4
shape <- 4
set.seed(42)
Y <- c(rgamma(500, rate = theta0, shape = shape), rgamma(500, rate = theta0 - 1, shape = shape))

# note that I have coded it for change in scale! As in the paper, but this can be changed any time in the interface. 
system.time(res <- focus_offline(Y, 50, family = "gamma", theta0 = NaN, args = list(shape = 4), adp_max_check = F))
   user  system elapsed 
  0.001   0.000   0.000 
plot(res$stat, type = "l")

Other distributions

Similarly, we have Bernoulli and Poisson.

### poisson ####
theta0 <- 2
set.seed(42)
Y <- c(rpois(500, theta0), rpois(500, 1.5))

system.time(res <- focus_offline(Y, 50, family = "poisson", theta0 = theta0, args = list(shape = 4), adp_max_check = F))
   user  system elapsed 
  0.000   0.000   0.001 
plot(res$stat, type = "l")

#### bernoulli ####
theta0 <- 0.4
set.seed(45)
Y <- c(rbinom(1e3, 1, theta0), rbinom(1e3, 1, theta0 + .2))
system.time(res <- focus_offline(Y, 50, family = "bernoulli", theta0 = theta0, args = list(), adp_max_check = F))
   user  system elapsed 
  0.000   0.000   0.001 
plot(res$stat, type = "l")

exfocus online implementation

It is possible to run the FOCuS detector in an online (sequential) fashion, updating the statistic as new data arrives. This is useful for real-time change detection, where you want to process each observation as it comes in, rather than in batch.

theta0 <- 0
set.seed(45)
Y <- c(rnorm(1e3, theta0), rnorm(500, theta0 - 1))

# Create a detector for the Gaussian family with unknown pre-change mean
det <- detector_create("gaussian", NaN, list())

# Sequentially update the detector and record the statistic
stat_trace <- numeric(length(Y))
for (i in seq_along(Y)) {
  detector_update(det, Y[i])
  stat_trace[i] <- detector_statistic(det)
}

plot(stat_trace, type = "l", main = "Online FOCuS Statistic (Gaussian)", ylab = "Statistic", xlab = "Time")

NP-FOCuS

Finally, NP-FOCuS can run as:

set.seed(42)
Y <- c(rnorm(5000), rnorm(100, sd = 2))


probs <- computeprobsNPPELT(15)
quants <- qnorm(probs)
system.time(res <- npfocus_offline(Y = Y, threshold = c(90, 15), quantiles = quants))
   user  system elapsed 
  0.032   0.000   0.031 
par(mfrow = c(2, 1))
plot(res$max_stat, type = "l")
plot(res$sum_stat, type = "l")

par(mfrow = c(1, 1))

which.max(table(apply(res$tau_stat[, 1:res$n], 2, median)))
647 
116 

One-sided Detection with side Argument

You can restrict detection to only upward (right) or downward (left) changes using the side argument. For example, to detect only upward changes (right-sided):

theta0 <- 0
set.seed(45)
Y <- c(rnorm(1e3, theta0), rnorm(500, theta0 + 2))

# Only detect upward changes (right-sided)
res_right <- focus_offline(Y, 50, family = "gaussian", theta0 = NaN, side = "right")
plot(res_right$stat, type = "l", main = "Right-sided FOCuS Statistic")

# Only detect downward changes (left-sided)
res_left <- focus_offline(Y, 50, family = "gaussian", theta0 = NaN, side = "left")
plot(res_left$stat, type = "l", main = "Left-sided FOCuS Statistic")

About

Efficient algorithms for detecting changes in data streams, based on the Focus algorithm. Includes Ex-FOCuS as well as NP-FOCuS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •