London Weather/Crime (Multivariate)

Introduction

In this example we will study the relationship between weather and crime in in Greater London. In particular we will be considering ten years of monthly data about weather and crime in Greater London from 2008 to 2018.

x = london()
plot(x)
2008-01-012011-01-012014-01-012017-01-0101.0×1042.0×1043.0×104MaxTempMinTempAirFrostRainSunViolenceBurglaryDamageDrugsFraudOtherRobberySexualTheft

The variables at play will be:

  • MaxTemp: Monthly data for the mean daily maximum temperature in C° for each month.
  • Violence: Violent crimes comprising; Assault with Injury, Common Assault, Grievous Bodily Harm, Harassment, Murder, Offensive Weapon, Other violence, Wounding/GBH
  • Sexual: Rape, Other sexual crimes.

Descriptive Analysis

Seasonality

splot(x[:,["Date","MaxTemp"]], title="Seasonal Maximum Temperature")
splot(x[:,["Date","Violence"]], title="Seasonal Violent Crimes")
splot(x[:,["Date","Sexual"]], title="Seasonal Sexual Crimes")

Sesonality patterns for MaxTemp are obvious and noticeable for Violence, however there's barely any to be seen in Sexual.

Normalization

To compare time series with different positive magnitudes and to facilitate an stationary forecasting we will normalize data by dividing each variable by its maximum value.

tx = copy(x)
tx[!,2:end] = Array(x[:,2:end]) ./   (1. * mapslices(maximum, Array(x[:,2:end]), dims=[1]))
tx = tx[:,["Date","MaxTemp","Violence","Sexual"]]

To achieve an stationary time series to start our analysis we need to differentiate MaxTemp with a lag of 12, however this might not be enough for Violence and Sexual since they also show a marked trend from 2013 onwards.

dtx = d(tx,1,12)
plot(dtx)
2009-01-012012-01-012015-01-012018-01-01−0.2−0.10.00.10.2d[1,12]_MaxTempd[1,12]_Violenced[1,12]_Sexual

An extra differentation might be required for Violence and Sexual but this seems to be one to many for MaxTemp indicated by an increase in its variance when doing so. Also, the apparent need for a differentation in Violence and Sexual seems to be a temporary effect rather than a permanent trend or seasonality therefore we will continue with just one differentation of order 12.

Fitting an AR(1) model

Given the scarcity of data for a multivariate model we cannot fit too many parameters while keeping its significance. We will therefore fit an AR of order one.

xar = ar(dtx,1,false)
Multivariate Autoregressive Model

    ar(X, order=1, constant=false)

Residuals Summary
┌──────────────────┬────────────┬────────────┬──────────────┬────────────┬───────────┬───────────┬───────────┐
│         Variable │        Min │         1Q │       Median │       Mean │        3Q │       Max │ H0 Normal │
├──────────────────┼────────────┼────────────┼──────────────┼────────────┼───────────┼───────────┼───────────┤
│  d[1,12]_MaxTemp │   -0.26383 │ -0.0586484 │ -0.000102283 │ 0.00233686 │ 0.0612348 │  0.205427 │  0.359532 │
│ d[1,12]_Violence │ -0.0621247 │ -0.0201322 │ -0.000376752 │ 0.00265504 │ 0.0240683 │ 0.0918733 │  0.282503 │
│   d[1,12]_Sexual │  -0.113873 │ -0.0224348 │    0.0130658 │  0.0117103 │ 0.0432341 │  0.110499 │  0.561249 │
└──────────────────┴────────────┴────────────┴──────────────┴────────────┴───────────┴───────────┴───────────┘
┌──────────────────┬────────────┬─────────────┬─────────────┬─────────────┐
│         Variable │       Mean │    Variance │    Skewness │    Kurtosis │
├──────────────────┼────────────┼─────────────┼─────────────┼─────────────┤
│  d[1,12]_MaxTemp │ 0.00233686 │  0.00689961 │  0.00689961 │  0.00689961 │
│ d[1,12]_Violence │ 0.00265504 │ 0.000875094 │ 0.000875094 │ 0.000875094 │
│   d[1,12]_Sexual │  0.0117103 │  0.00224528 │  0.00224528 │  0.00224528 │
└──────────────────┴────────────┴─────────────┴─────────────┴─────────────┘

Coefficients

Φ0
┌         ┐
│ 0.0 +   │
│ 0.0 +   │
│ 0.0 +   │
└         ┘
Φ1
┌                                   ┐
│  0.372 ***  -0.179      0.149     │
│ -0.131 ***   0.911 ***  0.024     │
│ -0.074 ^     0.295 **   0.601 *** │
└                                   ┘
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘^’ 0.1 ‘ ’ 1  and ‘+’ if fixed

Σ2 Variance/Covariance Matrix
┌                                      ┐
│ 0.00734058    0.0016834   0.00124417 │
│  0.0016834  0.000937838  0.000718899 │
│ 0.00124417  0.000718899   0.00253389 │
└                                      ┘

Information Criteria
┌──────────┬───────────┬─────────┬──────────┐
│      AIC │      AICC │     BIC │      H&Q │
├──────────┼───────────┼─────────┼──────────┤
│ -3.42655 │ -0.344114 │ 548.682 │ -2.66244 │
└──────────┴───────────┴─────────┴──────────┘

Statistics
┌──────────────────┬─────────────────┬──────────┬──────────┐
│         Variable │ Fisher's p-test │       R2 │    R2adj │
├──────────────────┼─────────────────┼──────────┼──────────┤
│  d[1,12]_MaxTemp │      1.12247e-8 │ 0.142054 │ 0.122388 │
│ d[1,12]_Violence │             0.0 │  0.77584 │ 0.770702 │
│   d[1,12]_Sexual │      2.93029e-8 │ 0.407072 │  0.39348 │
└──────────────────┴─────────────────┴──────────┴──────────┘

The first thing we can do is to use Φ1 to infer directional causality, in this case we see how MaxTemp is not influenced at all (with statistical significance) by Violence and Sexual, which could not possibly be otherwise in this case. On the other hand we have Violence and Sexual being affected significantly by MaxTemp.

The relathionship between Violence and Sexual is both ways, however, the effect Sexual has on Violence is small and barely significant which might prompt us to consider to remove it altogether, the opposite though is not true, Violence has a strong and significant influence in Sexual.

Fixing Coefficients

Let's now fix to 0 the influcence of Violence and Sexual on MaxTemp, remove the influence from MaxTemp on Sexual since it is accounted for via Violence and the non-significant influence of Sexual on Violence. If now we fit again the model we have

dϕ1 = xar.Φ
dϕ2 = copy(dϕ1)
dϕ2[1,2:3,1] .= 0.0
dϕ2[3,1,1] = 0.0
dϕ2[2,3,1] = 0.0
dΦ = (dϕ1, dϕ2)
fxar = ar(dtx,1,false; dΦ)
Multivariate Autoregressive Model

    ar(X, order=1, constant=false)

Residuals Summary
┌──────────────────┬────────────┬────────────┬─────────────┬────────────┬───────────┬───────────┬───────────┐
│         Variable │        Min │         1Q │      Median │       Mean │        3Q │       Max │ H0 Normal │
├──────────────────┼────────────┼────────────┼─────────────┼────────────┼───────────┼───────────┼───────────┤
│  d[1,12]_MaxTemp │  -0.258699 │ -0.0586711 │  0.00278472 │ 0.00460774 │ 0.0593254 │   0.20366 │   0.38802 │
│ d[1,12]_Violence │ -0.0620228 │ -0.0195767 │ 0.000967609 │ 0.00322703 │ 0.0232835 │ 0.0921191 │  0.279685 │
│   d[1,12]_Sexual │  -0.112371 │ -0.0192755 │   0.0145777 │   0.012186 │ 0.0461189 │  0.109148 │  0.544738 │
└──────────────────┴────────────┴────────────┴─────────────┴────────────┴───────────┴───────────┴───────────┘
┌──────────────────┬────────────┬─────────────┬─────────────┬─────────────┐
│         Variable │       Mean │    Variance │    Skewness │    Kurtosis │
├──────────────────┼────────────┼─────────────┼─────────────┼─────────────┤
│  d[1,12]_MaxTemp │ 0.00460774 │  0.00695722 │  0.00695722 │  0.00695722 │
│ d[1,12]_Violence │ 0.00322703 │ 0.000873262 │ 0.000873262 │ 0.000873262 │
│   d[1,12]_Sexual │   0.012186 │  0.00227608 │  0.00227608 │  0.00227608 │
└──────────────────┴────────────┴─────────────┴─────────────┴─────────────┘

Coefficients

Φ0
┌         ┐
│ 0.0 +   │
│ 0.0 +   │
│ 0.0 +   │
└         ┘
Φ1
┌                                  ┐
│  0.368 ***    0.0 +      0.0 +   │
│ -0.131 ***  0.932 ***    0.0 +   │
│    0.0 +    0.279 **   0.594 *** │
└                                  ┘
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘^’ 0.1 ‘ ’ 1  and ‘+’ if fixed

Σ2 Variance/Covariance Matrix
┌                                      ┐
│ 0.00741873    0.0016939   0.00124207 │
│  0.0016939  0.000939497  0.000719417 │
│ 0.00124207  0.000719417   0.00257882 │
└                                      ┘

Information Criteria
┌──────────┬──────────┬─────────┬──────────┐
│      AIC │     AICC │     BIC │      H&Q │
├──────────┼──────────┼─────────┼──────────┤
│ -4.01645 │ -3.50617 │ 376.335 │ -3.59194 │
└──────────┴──────────┴─────────┴──────────┘

Statistics
┌──────────────────┬─────────────────┬──────────┬──────────┐
│         Variable │ Fisher's p-test │       R2 │    R2adj │
├──────────────────┼─────────────────┼──────────┼──────────┤
│  d[1,12]_MaxTemp │      5.65306e-9 │  0.13292 │ 0.123095 │
│ d[1,12]_Violence │             0.0 │ 0.775443 │ 0.772899 │
│   d[1,12]_Sexual │      2.62374e-8 │ 0.396559 │ 0.389722 │
└──────────────────┴─────────────────┴──────────┴──────────┘

Transformed Forecasting

Next we can see the two years forecast for the transformed dataset.

dtfc = forecast(fxar,12*2)
plot(dtfc)
2013-01-012015-01-012017-01-012019-01-01Time−0.3−0.2−0.10.00.10.2Forecasting ar(X, order=1, constant=false)d[1,12]_MaxTempd[1,12]_Violenced[1,12]_Sexual

In order to obtain a forecast for the original data we need to invert the transformations carried out previously.

Scaled Forecasting

Next we can see the two years forecast for the scaled dataset.

x0 = Array(tx[1:12,2:end])
x0 = reshape(x0',1,3,12)
tfc = p(dtfc, x0)
setnames!(tfc,["MaxTemp / $(maximum(x.MaxTemp))",
               "Violence / $(maximum(x.Violence))",
               "Sexual / $(maximum(x.Sexual))"])
plot(tfc,title="Greater London Crime/Weather Scaled Forecast")
2013-01-012015-01-012017-01-012019-01-01Time0.00.20.40.60.81.01.2Greater London Crime/Weather Scaled ForecastMaxTemp / 28.3Violence / 25050Sexual / 2016

The two years forecast follows:

Final Forecast

fc = transform(tfc,(v->v*maximum(x.MaxTemp)),1);
fc = transform(fc,(v->round(v*maximum(x.Violence))),2)
fc = transform(fc,(v->round(v*maximum(x.Sexual))),3)
setnames!(fc,["MaxTemp","Violence","Sexual"])

fc.mean

24 rows × 4 columns

DateMaxTempViolenceSexual
Date…Float64Float64Float64
12019-01-0110.436621530.01699.0
22019-02-016.9713119412.01578.0
32019-03-019.8999322321.01752.0
42019-04-0115.536821654.01651.0
52019-05-0120.813624424.01877.0
62019-06-0124.20524076.01920.0
72019-07-0128.301825919.02072.0
82019-08-0124.500722048.01702.0
92019-09-0120.900222048.01780.0
102019-10-0116.500122658.01821.0
112019-11-0112.222219.01769.0
122019-12-0110.722143.01519.0
132020-01-0110.436622098.01736.0
142020-02-016.9713119942.01613.0
152020-03-019.8999322815.01785.0
162020-04-0115.536822114.01682.0
172020-05-0120.813624852.01906.0
182020-06-0124.20524475.01946.0
192020-07-0128.301826291.02097.0
202020-08-0124.500722395.01725.0
212020-09-0120.900222372.01802.0
222020-10-0116.500122959.01841.0
232020-11-0112.222500.01788.0
242020-12-0110.722405.01537.0