# ChangePointDetection.jl

A fast Julia implementation of the least square density difference (LSDD) method. It is used to detect changepoints in time-series or to infer wether or not two time-series come from the same underlying probability distribution. The LSDD method was developped in the article *Density-difference estimation* from *M. Sugiyama, T. Suzuki, T. Kanamori, M. C. du Plessis, S. Liu, and I. Takeuchi.* in 2013.

Given two time-series and produced by underlying probability densities and , the LSDD method directly modelizes the difference to compute its L2 norm. This approach has good convergence properties and is more accurate than computing the KL-divergence of and after estimating them from scratch.

In practice, the closer the LSDD value is to 0, the more similar and are. To estimate changepoints in a time-series, two sliding windows are used. The LSDD value of these two sliding windows is computed along the whole time-series. Spikes in the obtained LSDD "profile" indicate potential changepoints. The procedure is best understood graphically :

One ends up with a time-dependant LSDD "profile", and threshold can be set to detect changepoints when exceeded.

## Usage :

To infer if two time-series come from the same underlying probability density, use the `lsdd(ts1, ts2)`

function :

```
lsdd(x::Array{Float64,1}, y::Array{Float64,1}; folds = 5, sigma_list = nothing, lambda_list = nothing)
Input :
'x', 'y' : the time-series of data upon which to perform the lsdd computation.
folds : the number of cross-validation tests. higher is more precise but more expensive.
sigma_list, lambda_list : points defining the grid search during the optimization of gaussian kernels.
Returns :
L2 : lsdd value.
```

To compute the LSDD profile of a time-series, use the `lsdd_profile(ts; window = 70)`

function :

```
lsdd_profile(ts; window = 70)
Input :
`ts` : the time-series to analyze
window : the size of the sliding windows used for the lsdd computation. Bigger windows will be more accurate but can rapidly become very cumputationally expensive.
Returns : lsdd profile of the input time-series. Note that since we are using 2 sliding windows, only time steps [1:end-2*window] are actually taken in account.
```

Finally, to run an automatized changepoint detection, you can use the `(ts; threshold = 0.5, window = 150)`

function.

```
Input
ts : time-series to analyse
threshold : when exceeded, a changepoint is detected
window : size of the sliding windows;
Returns : a 1-D array of all detected change points.
```

## Example :

The following code

```
using Plots
ts1 = rand(500)
ts2 = 1.5 * rand(500) .+ 1
ts = vcat(ts1, ts2)
profile = lsdd_profile(ts; window = 50)
points = getpoints(profile)
a = plot(ts,xlabel = "Time steps",ylabel = "Value",label = "",title = "Random time-series with change point",color = "black")
b = plot(profile,xlabel = "Time steps",ylabel = "LSDD value",label = "lsdd",title = "Corresponding LSDD profile",color = "black")
vline!(b,points,label = "detected changepoint \n (threshold exceeded)",color = "red",lw = 3,linestyle = :dash)
p = plot(a, b, layout = (2, 1), legend = true)
display(p)
```

Produced the following output :