swatanomaly/README.Rmd at master · ygeunkim/swatanomaly · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
bibliography: swatanomalyref.bib
output:
  github_document:
    pandoc_args: --webtex
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE,
  out.width = "70%",
  fig.align = "center",
  fig.width = 6,
  fig.asp = .618
  )
options(digits = 3)
pander::panderOptions("round", 3)
```

# `swatanomaly`

Various anomaly detection algorithms studied in *anomaly detection project* including K-L divergence: @Cho:2019ji

## Installation

```{r, eval=FALSE}
devtools::install_github("ygeunkim/swatanomaly")
```

```{r, eval=FALSE}
library(swatanomaly)
```

<!-- This `R` package can cover NND and K-L divergence. As an baseline method, we provides static threshold method. -->

## Kullback-Leibler Divergence Algorithm

This algorithm implements neighboring-window method. Using `stats::density()`, we first estimate each probability mass. Then we can compute K-L divergence from $p$ to $q$ by

$$D_{KL} = \sum_{x \in \mathcal{X}} p(x) \ln \frac{p(x)}{q(x)}$$

where $\mathcal{X}$ is the support of $p$ ($f_1$).

The algorithm uses a threshold $\lambda$ and check if $D < \lambda$. If this holds, two windows can be said that they are derived from the same gaussian distribution.

### Fixed lambda algorithm

- Data: univariate series of size $n$
- Input: window size $win$, jump size for sliding window $jump$, threshold $\lambda$

Note that the number of window is

$$\frac{n - win}{jump} + 1 \equiv w$$


1. For $i \leftarrow 1$ to (w - 1) do
    1. if $D < \lambda$ then
        1. $f_1$ be the pdf of $i$-th window and $f_2$ be the pdf of (i + 1)-th window
        2. K-L divergence $d_i$ from $f_2$ to $f_1$
        3. Set $D = d_i$.
        4. $f^{\prime} = f_1$
    2. else
        1. $f^{\prime}$ be the pdf of normal and $f_2$ be the pdf of (i + 1)-th window
        2. K-L divergence $d_i$ from $f_2$ to $f^{\prime}$
        3. Set $D = d_i$.
2. output: $\{ d_1, \ldots, d_{w - 1} \}$


### Dynamic lambda algorithm

- Data: univariate series of size $n$
- Input: window size $win$, jump size for sliding window $jump$, threshold increment $\lambda^{\prime}$, threshold updating constant $\epsilon$

The threshold is initialized by

$$\lambda = \lambda^{\prime} \epsilon$$

If $D < \lambda$, it is updated by

$$\lambda = \lambda^{\prime} (d_{j - 2} + \epsilon)$$


1. $\lambda = \lambda^{\prime} \epsilon$
2. $d_1$ K-L from $1$-st to $2$-nd window
3. $d_2$ K-L from $2$-nd to $3$-rd window
4. For $i \leftarrow 3$ to (w - 1) do
    1. if $D < \lambda$ then
        1. $f_1$ be the pdf of $i$-th window and $f_2$ be the pdf of (i + 1)-th window
        2. K-L divergence $d_i$ from $f_2$ to $f_1$
        3. $\lambda = \lambda^{\prime} (d_{j - 2} + \epsilon)$
        4. Set $D = d_i$.
        5. $f^{\prime} = f_1$
    2. else
        1. $f^{\prime}$ be the pdf of normal and $f_2$ be the pdf of (i + 1)-th window
        2. K-L divergence $d_i$ from $f_2$ to $f^{\prime}$
        3. Set $D = d_i$.
5. output: $\{ d_1, \ldots, d_{w - 1} \}$


***

# References