Getting Started

To demonstrate the basic usage of DiffinDiffs.jl, we walk through the processes of reproducing empirical results from relevant studies. Please refer to the original papers for details on the context.

Dynamic Effects in Event Studies

As a starting point, we reproduce results from the empirical illustration in Liyang Sun, Sarah Abraham (2021).

Data Preparation

DiffinDiffs.jl requires that the data used for estimation are stored in a column table compatible with the interface defined in Tables.jl. This means that virtually all types of data frames, including DataFrames.jl, are supported. For the sake of illustration, here we directly load the dataset that is bundled with the package by calling exampledata:

using DiffinDiffs
hrs = exampledata("hrs")
3280×11 VecColumnTable:
  Row │ hhidpn   wave  wave_hosp  oop_spend  riearnsemp  rwthh   male  spouse  ⋯
      │  Int64  Int64      Int64    Float64     Float64  Int64  Int64   Int64  ⋯
──────┼─────────────────────────────────────────────────────────────────────────
    1 │      1     10         10    6532.91   6.37159e5   4042      0       0  ⋯
    2 │      1      8         10    1326.93   3.67451e5   3975      0       0  ⋯
    3 │      1     11         10    1050.33     74130.5   3976      0       0  ⋯
    4 │      1      9         10    979.418     84757.4   3703      0       0  ⋯
    5 │      1      7         10    5498.68   1.66128e5   5295      0       0  ⋯
    6 │      2      8          8    41504.0         0.0   5187      0       1  ⋯
    7 │      2      7          8    3672.86         0.0   4186      0       1  ⋯
    8 │      2     10          8    1174.19         0.0   3729      0       1  ⋯
  ⋮   │   ⋮       ⋮        ⋮          ⋮          ⋮         ⋮      ⋮      ⋮     ⋱
 3273 │    655      8          9     1530.0     45000.0   8461      0       1  ⋯
 3274 │    655      9          9    7373.89     10359.2   9345      0       1  ⋯
 3275 │    655     10          9    673.568     38229.5   8420      0       1  ⋯
 3276 │    656     11          8    3020.78         0.0   1930      0       0  ⋯
 3277 │    656      8          8     2632.0         0.0   4810      0       0  ⋯
 3278 │    656      9          8     657.34         0.0   4768      0       0  ⋯
 3279 │    656     10          8    782.795         0.0   1909      0       0  ⋯
 3280 │    656      7          8    4182.39         0.0   4374      0       0  ⋯

In this example, hhidpn, wave, and wave_hosp are columns for the unit IDs, time IDs and treatment time respectively. The rest of the columns contain the outcome variables and covariates. It is important that the time IDs and treatment time refer to each time period in a compatible way so that subtracting a value of treatment time from a value of calendar time (represented by a time ID) with operator - yields a meaningful value of relative time, the amount of time elapsed since treatment time.

Empirical Specifications

To produce the estimates reported in panel (a) of Table 3 from Liyang Sun, Sarah Abraham (2021), we specify the estimation via @did as follows:

r = @did(Reg, data=hrs, dynamic(:wave, -1), notyettreated(11),
    vce=Vcov.cluster(:hhidpn), yterm=term(:oop_spend), treatname=:wave_hosp,
    treatintterms=(), xterms=(fe(:wave)+fe(:hhidpn)))

Before we look at the results, we briefly explain some of the arguments that are relatively more important. Reg, which is a shorthand for RegressionBasedDID, is the type of the estimation to be conducted. Here, we need estimation that is conducted by directly solving least-squares regression and hence we use Reg to inform @did the relevant set of procedures, which also determines the set of arguments that are accepted by @did.

We are interested in the dynamic treatment effects. Hence, we use dynamic to specify the data column containing values representing calendar time of the observations and the reference period, which is -1. For identification, a crucial assumption underlying DID is the parallel trends assumption. Here, we assume that the average outcome paths of units treated in periods before 11 would be parallel to the observed paths of units treated in period 11. That is, we are taking units with treatment time 11 as the not-yet-treated control group. We specify treatname to be :wave_hosp, which indicates the column that contains the treatment time. The interpretation of treatname depends on the context that is jointly determined by the type of the estimator, the type of the treatment and possibly the type of parallel trends assumption. The rest of the arguments provide additional information on the regression specifications. The use of them can be found in the documentation for RegressionBasedDID.

We now move on to the result returned by @did:

──────────────────────────────────────────────────────────────────────
Summary of results: Regression-based DID
──────────────────────────────────────────────────────────────────────
Number of obs:               2624    Degrees of freedom:            14
F-statistic:                 6.42    p-value:                   <1e-07
──────────────────────────────────────────────────────────────────────
Cohort-interacted sharp dynamic specification
──────────────────────────────────────────────────────────────────────
Number of cohorts:              3    Interactions within cohorts:    0
Relative time periods:          5    Excluded periods:              -1
──────────────────────────────────────────────────────────────────────
Fixed effects: fe_hhidpn fe_wave
──────────────────────────────────────────────────────────────────────
Converged:                   true    Singletons dropped:             0
──────────────────────────────────────────────────────────────────────

The object returned is of type RegressionBasedDIDResult, which contains the estimates for treatment-group-specific average treatment effects among other information. Instead of printing the estimates from the regression, which can be very long if there are many treatment groups, REPL prints a summary table for r. Here we verify that the estimate for relative time 0 among the cohort who received treatment in period 8 is about 2826, the value reported in the third column of Table 3(a) in the paper.

coef(r, "wave_hosp: 8 & rel: 0")
2825.5659117514188

Various accessor methods are defined for retrieving values from a result such as r. See Results for a full list of them.

Aggregation of Estimates

The treatment-group-specific estimates in r are typically not the ultimate objects of interest. We need to estimate the path of the average dynamic treatment effects across all treatment groups. Such estimates can be easily obtained by aggregating the estimates in r via agg:

a = agg(r, :rel)
───────────────────────────────────────────────────────────────────
         Estimate  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────
rel: -3   591.046    1273.08   0.46    0.6425  -1905.3      3087.39
rel: -2   352.639     697.78   0.51    0.6133  -1015.62     1720.9
rel: 0   2960.04      540.989  5.47    <1e-07   1899.23     4020.86
rel: 1    529.767     586.831  0.90    0.3667   -620.935    1680.47
rel: 2    800.106    1010.81   0.79    0.4287  -1181.97     2782.18
───────────────────────────────────────────────────────────────────

Notice that :rel is a special value used to indicate that the aggregation is conducted for each value of relative time separately. The aggregation takes into account sample weights of each treatment group and the variance-covariance matrix. The resulting estimates match those reported in the second column of Table 3(a) exactly.