-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
104 lines (73 loc) · 2.94 KB
/
README.Rmd
File metadata and controls
104 lines (73 loc) · 2.94 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
bibliography: swatanomalyref.bib
output:
github_document:
pandoc_args: --webtex
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
out.width = "70%",
fig.align = "center",
fig.width = 6,
fig.asp = .618
)
options(digits = 3)
pander::panderOptions("round", 3)
```
# `swatanomaly`
Various anomaly detection algorithms studied in *anomaly detection project* including K-L divergence: @Cho:2019ji
## Installation
```{r, eval=FALSE}
devtools::install_github("ygeunkim/swatanomaly")
```
```{r, eval=FALSE}
library(swatanomaly)
```
<!-- This `R` package can cover NND and K-L divergence. As an baseline method, we provides static threshold method. -->
## Kullback-Leibler Divergence Algorithm
This algorithm implements neighboring-window method. Using `stats::density()`, we first estimate each probability mass. Then we can compute K-L divergence from $p$ to $q$ by
$$D_{KL} = \sum_{x \in \mathcal{X}} p(x) \ln \frac{p(x)}{q(x)}$$
where $\mathcal{X}$ is the support of $p$ ($f_1$).
The algorithm uses a threshold $\lambda$ and check if $D < \lambda$. If this holds, two windows can be said that they are derived from the same gaussian distribution.
### Fixed lambda algorithm
- Data: univariate series of size $n$
- Input: window size $win$, jump size for sliding window $jump$, threshold $\lambda$
Note that the number of window is
$$\frac{n - win}{jump} + 1 \equiv w$$
1. For $i \leftarrow 1$ to (w - 1) do
1. if $D < \lambda$ then
1. $f_1$ be the pdf of $i$-th window and $f_2$ be the pdf of (i + 1)-th window
2. K-L divergence $d_i$ from $f_2$ to $f_1$
3. Set $D = d_i$.
4. $f^{\prime} = f_1$
2. else
1. $f^{\prime}$ be the pdf of normal and $f_2$ be the pdf of (i + 1)-th window
2. K-L divergence $d_i$ from $f_2$ to $f^{\prime}$
3. Set $D = d_i$.
2. output: $\{ d_1, \ldots, d_{w - 1} \}$
### Dynamic lambda algorithm
- Data: univariate series of size $n$
- Input: window size $win$, jump size for sliding window $jump$, threshold increment $\lambda^{\prime}$, threshold updating constant $\epsilon$
The threshold is initialized by
$$\lambda = \lambda^{\prime} \epsilon$$
If $D < \lambda$, it is updated by
$$\lambda = \lambda^{\prime} (d_{j - 2} + \epsilon)$$
1. $\lambda = \lambda^{\prime} \epsilon$
2. $d_1$ K-L from $1$-st to $2$-nd window
3. $d_2$ K-L from $2$-nd to $3$-rd window
4. For $i \leftarrow 3$ to (w - 1) do
1. if $D < \lambda$ then
1. $f_1$ be the pdf of $i$-th window and $f_2$ be the pdf of (i + 1)-th window
2. K-L divergence $d_i$ from $f_2$ to $f_1$
3. $\lambda = \lambda^{\prime} (d_{j - 2} + \epsilon)$
4. Set $D = d_i$.
5. $f^{\prime} = f_1$
2. else
1. $f^{\prime}$ be the pdf of normal and $f_2$ be the pdf of (i + 1)-th window
2. K-L divergence $d_i$ from $f_2$ to $f^{\prime}$
3. Set $D = d_i$.
5. output: $\{ d_1, \ldots, d_{w - 1} \}$
***
# References