Skip to content

xiaobaaaa/EventStudyInteracts.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EventStudyInteracts.jl

This package is a Julia replication of the Stata package eventstudyinteract provided by Sun and Abraham (2021).

The estimation process is identical to that of eventstudyinteract, and this package used FixedEffectModels.jl to implement the Stata command reghdfe functionality.

As introduced in the Stata package eventstudyinteract provided by Sun and Abraham (2021), the function of this package is:

eventstudyinteract implements the interaction weighted (IW) estimator and constructs pointwise confidence interval for the estimation of dynamic treatment effects. To estimate the dynamic effects of an absorbing treatment, researchers often use two-way fixed effects (TWFE) regressions that include leads and lags of the treatment (event study specification). Units are categorized into different cohorts based on their initial treatment timing. Sun and Abraham (2020) proposes this estimator as an alternative to the TWFE regression in the presence of treatment effects heterogeneous across cohorts. Under treatment effects heterogeneity, the TWFE regression can result in estimates with uninterpretable weights, which can be assessed by the Stata module eventstudyweights. The IW estimator is implemented in three steps. First, estimate the interacted regression with reghdfe, where the interactions are between relative time indicators and cohort indicators. Second, estimate the cohort shares underlying each relative time. Third, take the weighted average of estimates from the first step, with weights set to the estimated cohort shares.

Introduction

  • Julia is faster than Stata, and FixedEffectModels.jl is much faster than reghdfe.

  • I am a beginner in programming and the code is a complete translation of the Stata package eventstudyinteract. I used newbing to help me with programming, and it is very helpful for beginners.

  • I used FixedEffectModels.jl directly for estimation and I am grateful to the author of this package. Other parts of this package, such as reporting of estimation results, also refer to FixedEffectModels.jl.

  • The Stata package eventstudyinteract is very flexible, and one important feature is that it can compare event study estimates for subsamples, i.e. perform heterogeneous estimation. fixest can estimate Sun-Abraham specifications through sunab(...), but I still find the workflow design of the Stata package more user-friendly for these heterogeneous-estimation use cases.

  • This package fully supports the features of FixedEffectModels.jl, such as using GPU for estimation. The syntax of this package is similar to that of FixedEffectModels.jl.

Installation

EventStudyInteracts.jl is registered in the General registry.

The package is currently maintained against:

  • Julia 1.10+
  • FixedEffectModels.jl 1.13
  • Vcov.jl 0.8

You can install it at the REPL with ] add EventStudyInteracts.

For local development in this repository, run julia --project=. -e "using Pkg; Pkg.instantiate()".

For formatted regression tables, pair this package with RegressionTables.jl. The examples below use the current v0.7.x API; recent releases are listed on the official RegressionTables.jl releases page.

Local Validation

Before pushing a maintenance update, use the following local checks:

.\scripts\run_pkg_test.ps1
julia --project=. .\scripts\single_cohort_equivalence.jl

What these two commands cover:

  • scripts/run_pkg_test.ps1 bootstraps a local depot for this repository, rebuilds the local test manifest if needed, and runs the full Pkg.test() suite.
  • scripts/single_cohort_equivalence.jl generates a synthetic panel with one treated cohort and verifies that eventreg(...) matches a direct FixedEffectModels.reg(...) event-study regression exactly, including period-by-period standard errors and the aggregated ATE(g0:g3).

A change is ready to push to GitHub when all of the following are true:

  • Pkg.test() passes locally.
  • The single-cohort equivalence check reports zero differences.
  • The checked-in Stata reference comparison still passes.
  • README examples and benchmark claims are consistent with the current code.

Maintenance

This repository includes separate maintenance automation in addition to CI:

  • CI runs package tests on main.
  • CompatHelper watches dependency compat updates.
  • TagBot handles tags and releases.
  • Maintenance runs weekly and checks whether FixedEffectModels.jl has a newer upstream release, runs Julia 1.10 tests, runs a Julia stable import smoke test, and writes a maintenance report.
  • The main test suite also includes a RegressionTables.jl smoke test so table output stays visible during dependency upgrades.

The maintenance check compares the nlswork example against the checked-in Stata baseline in reference/nlswork_stata.toml. Any additional *.toml file placed in reference/ is picked up automatically, so you can add reference/boss_reference.toml to compare against your boss's saved results as well.

If you want CompatHelper to open PRs automatically on GitHub, add the repository secret COMPATHELPER_PRIV.

Codex Workflow

This repository is maintained with Codex, using the current GPT-5 coding model together with the local Julia, R, and Stata toolchain on this machine.

The default maintenance workflow is:

  1. Weekly monitoring: Maintenance checks upstream dependency drift, especially FixedEffectModels.jl, and writes a machine-generated report.
  2. Upgrade trigger: when FixedEffectModels.jl or another important dependency changes, Codex updates compat bounds, adapts the package code, and reruns the local validation workflow.
  3. Validation gate: the change is not considered complete until Pkg.test() passes, the RegressionTables.jl smoke test passes, and the checked-in Stata reference comparison passes.
  4. Documentation refresh: if behavior, compat, or benchmark results change, README.md, benchmark artifacts, and reference files are updated in the same maintenance cycle.
  5. Optional external parity: if reference/boss_reference.toml is present, the same workflow also checks the package against your boss's saved output.

This workflow is intended to guarantee the following:

  • EventStudyInteracts.jl remains compatible with Julia 1.10 LTS, with an additional stable-version import smoke check.
  • The nlswork reference example remains numerically close to the checked-in Stata baseline, within the tolerances stored in reference/nlswork_stata.toml.
  • RegressionTables.jl integration keeps working as part of the main automated test suite.
  • README performance claims are backed by checked-in scripts and saved benchmark outputs under benchmark/.

This workflow does not guarantee the following:

  • Exact equality for all summary metadata across package versions. Values such as degrees of freedom, adjusted R2, or iteration counts can change slightly across FixedEffectModels.jl releases even when the core coefficient vector remains aligned.
  • Equality to any outside result that is not checked into reference/.
  • Fully automatic release decisions. Codex can implement and validate changes, but versioning, tagging, and registration still follow the normal repository release process.

Syntax

result = eventreg(
    df,
    formula,
    rel_varlist::Vector{Symbol},
    control_cohort::Symbol,
    cohort::Symbol,
    vcov;
    contrasts = contrasts,
    weights = weights,
    save = save,
    method = method,
    nthreads = nthreads,
    double_precision = double_precision,
    tol = tol,
    maxiter = maxiter,
    drop_singletons = drop_singletons,
    progress_bar = progress_bar,
    dof_add = dof_add,
    subset = subset,
)

The syntax is similar to FixedEffectModels.jl, and the current package line is tested against FixedEffectModels.jl 1.13 on Julia 1.10+. Users need to specify rel_varlist::Vector{Symbol}, control_cohort::Symbol, and cohort::Symbol explicitly. The rel_varlist is the list of relative time indicators as you would have included in the canonical two-way fixed effects regression. The rel_varlist must be defined as a Vector{Symbol}.

dof_add is still accepted for backward compatibility, but it is currently ignored when this package is used with FixedEffectModels.jl 1.13+.

rel_varlist = Symbol[:g_4, :g_3, :g_2, :g0, :g1, :g2, :g3]

See illustration for an example of generating these relative time indicators. The relative time indicators should take the value of zero for never treated units. Users should shape their dataset to a long format where each observation is at the unit-time level. See illustration for an example of specifying the syntax. The syntax is similar to FixedEffectModels.jl in specifying fixed effects and the type of standard error reported.

Furthermore, it also requires the user to specify the cohort categories as well as which cohort is the control cohort. Note that Sun and Abraham (2021) only establishes the validity of the IW estimators for balanced panel data without covariates. eventstudyinteract does not impose a balanced panel by dropping units with observations that are missing in any iod. Instead, it evaluates the IW estimators for unbalanced panel data by estimating the interacted regression using all observations.

The control_cohort must be defined as a Symbol, which is a binary variable that corresponds to the control cohort, which can be never-treated units or last-treated units. If using last-treated unit as control cohort, exclude the time periods when the last cohort receives treatment.

The cohort must be defined as a Symbol, which is a categorical variable that corresponds to the initial treatment timing of each unit. If there are units that receive multiple treatments, Sun and Abraham (2021) defines the initial treatment timing to be based on the first treatment. This categorical variable should be set to be missing for never treated units.

Single-Cohort Equivalence

When all treated units start treatment in the same period, eventreg(...) should collapse to the same event-study regression you would estimate directly with FixedEffectModels.jl. In that special case there is only one treated cohort, so the IW weights are all equal to one and no extra share-aggregation uncertainty should remain.

This repository checks that case explicitly with scripts/single_cohort_equivalence.jl. The script generates a deterministic panel, estimates the model both ways, and compares every event-time coefficient, every standard error, and the average post-treatment effect ATE(g0:g3) under four covariance estimators. The same logic also runs inside Pkg.test().

To rerun the check locally:

julia --project=. .\scripts\single_cohort_equivalence.jl

On the current checked-in synthetic panel, the maximum absolute differences are:

Vcov max abs coef diff max abs se diff max abs vcov diff abs ATE diff abs ATE se diff
simple 0.0 0.0 0.0 0.0 0.0
robust 0.0 0.0 0.0 0.0 0.0
cluster(:id) 0.0 0.0 0.0 0.0 0.0
cluster(:id, :t) 0.0 0.0 0.0 0.0 0.0

For the two-way clustered case (cluster(:id, :t)), the period-by-period estimates and standard errors are:

Term Direct FEM estimate eventreg estimate Direct FEM std. error eventreg std. error
g_3 -0.71994246 -0.71994246 0.34987645 0.34987645
g_2 0.067715152 0.067715152 0.32766335 0.32766335
g0 -0.29595699 -0.29595699 0.33389899 0.33389899
g1 2.2834434 2.2834434 0.33145837 0.33145837
g2 2.094273 2.094273 0.34614113 0.34614113
g3 2.6978103 2.6978103 0.33474164 0.33474164
ATE(g0:g3) 1.6948924 1.6948924 0.33402829 0.33402829

This equivalence is a sanity check for the single-cohort edge case only. With multiple treated cohorts, a direct TWFE-style event-study regression and the Sun-Abraham IW estimator generally target different objects, which is exactly why this package exists.

Examples

  • You can download the nlswork.dta data from the repository, which is the example data used by eventstudyinteract, and the method of generating variables is the same as that of eventstudyinteract.
using DataFrames
using FixedEffectModels
using EventStudyInteracts
using ReadStatTables
using RegressionTables
using CUDA
using Plots

# Load the 1968 extract of the National Longitudinal Survey of Young Women and Mature Women.
df = DataFrame(readstat(joinpath("dataset", "nlswork.dta")))

# Code the cohort categorical variable based on when the individual first joined the union, which will be inputted in cohort(varname).

df.union_year = ifelse.(coalesce.(df.union, false) .== 1, df.year, missing)
# This is really frustrating. In Stata, the simple code bysort idcode: egen first_union = min(union_year) can achieve this. Why is Julia so complicated?
function min_skipmissing(x)
    v = collect(skipmissing(x))
    return isempty(v) ? missing : minimum(v)
end

transform!(groupby(df, :idcode), :union_year => min_skipmissing => :first_union)

select!(df, Not(:union_year))

# Code the relative time categorical variable.
df.ry = df.year - df.first_union

# For the first example, we take the control cohort to be individuals that never unionized.
df.never_union = ismissing.(df.first_union)

# Check if there is a sufficient number of treated units for each relative time. With very few units it might be better to bin the relative times and assume constant treatment effects within the bin.
function tab1(df::DataFrame, catevar::Symbol)
    gdf = groupby(df, catevar)
    result = combine(gdf, nrow => :freq)
    result.percent = result.freq / nrow(df) * 100
    sort!(result, catevar)
    result.cum = cumsum(result.percent)
    return result
end

tab1(df,:ry)
# We will consider the dynamic effect of union status on income. We first generate these relative time indicators, and leave out the distant leads due to few observations. Implicitly this assumes that effects outside the lead windows are zero.
relmin = abs(minimum(skipmissing(df.ry)))
relmax = abs(maximum(skipmissing(df.ry)))

for k in relmin:-1:2
    df[!, Symbol("g_$k")] = Int.(coalesce.(df.ry .== -k, false))
end

for k in 0:relmax
    df[!, Symbol("g$k")] = Int.(coalesce.(df.ry .== k, false))
end

absorb = [:idcode,:year]

formula1 = term(:ln_wage) ~ term(:south) + sum(fe.(term.(absorb)))

# To test whether the package can run successfully when there are collinear variables.
df.g_19 .= 0

# The rel_varlist must be defined as a Vector{Symbol}. In the example below, when defining an empty Vector, you cannot use rel_varlist1 = [].

rel_varlist1 = Symbol[:g_4, :g_5]

for i in relmin:-1:2
    push!(rel_varlist1, Symbol("g_"*string(i)))
end

for i in 0:relmax
    push!(rel_varlist1, Symbol("g"*string(i)))
end

control_cohort1 = :never_union

cohort1 = :first_union

vcov1 = Vcov.cluster(:idcode)

m1 = eventreg(df, formula1, rel_varlist1, control_cohort1, cohort1, vcov1)

#                    EventStudyInteract IW estimator
# ========================================================================
# Number of obs:                27979  Degrees of freedom:             166
# R2:                           0.666  R2 Adjusted:                  0.664
# F-Stat:                     2.81691  p-value:                      0.000
# R2 within:                    0.030  Iterations:                      11
# ========================================================================
# ln_wage |   Estimate Std.Error   t value Pr(>|t|)   Lower 95%  Upper 95%
# ------------------------------------------------------------------------
# g_19    |        0.0       0.0       NaN      NaN         0.0        0.0
# g_18    | -0.0609283 0.0513893  -1.18562    0.236   -0.161679  0.0398221
# g_17    | -0.0489819 0.0432146  -1.13346    0.257   -0.133706  0.0357418
# g_16    | -0.0680484 0.0495311  -1.37385    0.170   -0.165156  0.0290589
# g_15    |  -0.135218 0.0448805  -3.01284    0.003   -0.223208 -0.0472281
# g_14    |  -0.103295 0.0443376  -2.32975    0.020   -0.190221   -0.01637
# g_13    | -0.0684827 0.0416284   -1.6451    0.100   -0.150097  0.0131311
# g_12    |  -0.132015 0.0385567  -3.42392    0.001   -0.207606 -0.0564232
# g_11    |   -0.15239 0.0376567  -4.04683    0.000   -0.226217  -0.078563
# g_10    |   -0.11842 0.0291013  -4.06925    0.000   -0.175474 -0.0613663
# g_9     |  -0.172996 0.0330077  -5.24109    0.000   -0.237709  -0.108284
# g_8     |  -0.116192  0.030757  -3.77773    0.000   -0.176492 -0.0558915
# g_7     |  -0.140938 0.0287493  -4.90233    0.000   -0.197302 -0.0845745
# g_6     |  -0.133223 0.0369878  -3.60181    0.000   -0.205739 -0.0607073
# g_5     |  -0.131416 0.0238826  -5.50261    0.000   -0.178239 -0.0845938
# g_4     |  -0.150598 0.0330246  -4.56016    0.000   -0.215344 -0.0858518
# g_3     | -0.0559142 0.0202356  -2.76316    0.006  -0.0955869 -0.0162416
# g_2     | -0.0565837 0.0191366  -2.95683    0.003  -0.0941016 -0.0190657
# g0      |  0.0515795 0.0146284   3.52598    0.000      0.0229  0.0802591
# g1      |  0.0961542 0.0194692   4.93879    0.000   0.0579842   0.134324
# g2      |  0.0677024 0.0182556   3.70858    0.000   0.0319116   0.103493
# g3      |  0.0663372  0.018431   3.59923    0.000   0.0302027   0.102472
# g4      |   0.132223 0.0276378   4.78414    0.000   0.0780381   0.186408
# g5      |  0.0565864 0.0182183   3.10603    0.002   0.0208689  0.0923039
# g6      |   0.117224 0.0215077    5.4503    0.000   0.0750571    0.15939
# g7      |   0.072123 0.0221207   3.26042    0.001   0.0287545   0.115491
# g8      |  0.0666257 0.0193959   3.43504    0.001   0.0285993   0.104652
# g9      |  0.0929835 0.0349948   2.65707    0.008    0.024375   0.161592
# g10     |  0.0675955 0.0237323   2.84825    0.004   0.0210676   0.114124
# g11     |   0.106953 0.0212674   5.02896    0.000   0.0652575   0.148649
# g12     |  0.0647985 0.0365021    1.7752    0.076 -0.00676515   0.136362
# g13     |   0.134787 0.0412699   3.26599    0.001   0.0538762   0.215698
# g14     |   0.155875 0.0501801   3.10631    0.002   0.0574951   0.254255
# g15     |  0.0665542 0.0426601   1.56011    0.119  -0.0170823   0.150191
# g16     |   0.181412 0.0511286   3.54815    0.000   0.0811727   0.281651
# g17     |  0.0233864 0.0438817  0.532943    0.594  -0.0626452   0.109418
# g18     | -0.0477967 0.0750543 -0.636828    0.524   -0.194943  0.0993498
# ========================================================================

The repository maintenance workflow checks this example automatically against the checked-in Stata reference. The coefficient vector is expected to match very closely; summary metadata such as degrees of freedom, adjusted R2, or iteration counts can move slightly across FixedEffectModels.jl versions.

  • Event study plots. We may define a coefplot function for an event study plot.
function mycoefplot(m)
        n = coefnames(m)
        vals = coef(m)
        errors = stderror(m)
        Plots.scatter(
            n,
            vals,
            legend = false,
            yerror = 1.96 .* errors,
            #title = "Coefficient plot",
            xtickfont = font(10, "Times"),
            ytickfont = font(10, "Times"),
            titlefont = font(14, "Times"),
            # markersize = 2
    )
    Plots.vline!([3.5,],linestyle = :dash,linecolor=:red)
    Plots.hline!([0,],linestyle = :dash,linecolor=:black,linewidth=:2)
end

plot1 = mycoefplot(m1)
  • Binning. Pre-treatment effects seem relatively constant, which might suggest binning the many leads.  TODO: current implementation of bins does not follow Sun and Abraham (2020) exactly due to coding challenge.  But it is valid if effects in the bin are constant for each cohort.
df.g_l4 = Int.(coalesce.(df.ry .<= -4, false))

rel_varlist2 = Symbol[:g_l4]

for i in 3:-1:2
    push!(rel_varlist2, Symbol("g_"*string(i)))
end

for i in 0:18
    push!(rel_varlist2, Symbol("g"*string(i)))
end

m2 = eventreg(df, formula1, rel_varlist2, control_cohort1, cohort1, vcov1)
  • Using the last treated as the control cohort. Alternatively, we can take the control cohort to be individuals that were unionized last.
df.last_union = Int.(coalesce.(df.first_union .== 88, false))
control_cohort2 = :last_union
# If using the last-treated cohort as the control, be sure to restrict the analysis sample to exclude the never-treated (if any) and to be before the treated periods for the last-treated cohort.

m3 = eventreg(df[(.!ismissing.(df.first_union)) .& (df.year .< 88), :], formula1, rel_varlist2, control_cohort1, cohort1, vcov1)
  • Aggregating event study estimates. In Stata one can use lincom looks for coefficients and variance covariance matrix stored in e(b) and e(V). There is no package like lincom in Julia. I wrote another package LinComs.jl to do this. You can refer to the following code after estimating the results.
rel_varlist1[20:38]
expr = Expr(:call, :/, Expr(:call, :+, rel_varlist1...), length(rel_varlist1))
# Or,
expr = :((g_20 + g_19 + g_18 + g_17 + g_16 + g_15 + g_14 + g_13 + g_12 + g_11 + g_10 + g_9 + g_8 + g_7 + g_6 + g_5 + g_4 + g_3 + g_2 + g0 + g1 + g2 + g3 + g4 + g5 + g6 + g7 + g8 + g9 + g10 + g11 + g12 + g13 + g14 + g15 + g16 + g17 + g18) / 38)
# And then,
lincom(m1,expr)
# Returns,
                                      lincom                                      
==================================================================================
ln_wage            |    Estimate Std.Error   t value Pr(>|t|)  Lower 95% Upper 95%
----------------------------------------------------------------------------------
Linear Combination | -0.00369359 0.0131072 -0.281798    0.778 -0.0293907 0.0220035
==================================================================================

The results obtained from LinComs.jl can also be used with RegressionTables.jl to generate regression tables.

  • Compare event study estimates for subsamples. Suppose we want to compare the average effect over the first five years of joining the union between college graduates and non-college graduates. We can first estimate their separate effects by interacting the relative time indicators with icator of college graduates:
for k in 18:-1:2
    df[!, Symbol("g_", k, "_collgrad0")] = ifelse.(coalesce.(df.ry .== -k, false) .& coalesce.(df.collgrad .== 0, false), 1, 0)
end
for k in 0:18
    df[!, Symbol("g", k, "_collgrad0")] = ifelse.(coalesce.(df.ry .== k, false) .& coalesce.(df.collgrad .== 0, false), 1, 0)
end
df.g_l4_collgrad0 = ifelse.(coalesce.(df.ry .<= -4, false) .& coalesce.(df.collgrad .== 0, false), 1, 0)
for k in 18:-1:2
    df[!, Symbol("g_", k, "_collgrad1")] = ifelse.(coalesce.(df.ry .== -k, false) .& coalesce.(df.collgrad .== 1, false), 1, 0)
end
for k in 0:18
    df[!, Symbol("g", k, "_collgrad1")] = ifelse.(coalesce.(df.ry .== k, false) .& coalesce.(df.collgrad .== 1, false), 1, 0)
end
df.g_l4_collgrad1 = ifelse.(coalesce.(df.ry .<= -4, false) .& coalesce.(df.collgrad .== 1, false), 1, 0)

rel_varlist3 = Symbol[:g_l4_collgrad0]

for i in 3:-1:2
    push!(rel_varlist3, Symbol("g_"*string(i)*"_collgrad0"))
end

for i in 0:18
    push!(rel_varlist3, Symbol("g"*string(i)*"_collgrad0"))
end

push!(rel_varlist3, :g_l4_collgrad1)
for i in 3:-1:2
    push!(rel_varlist3, Symbol("g_"*string(i)*"_collgrad1"))
end

for i in 0:18
    push!(rel_varlist3, Symbol("g"*string(i)*"_collgrad1"))
end

m4 = eventreg(df, formula1, rel_varlist3, control_cohort1, cohort1, vcov1)

Output

eventreg returns a lightweight regression-model object with the coefficient vector, covariance matrix, estimation-sample indicator, and summary statistics. It supports coef, coefnames, vcov, confint, coeftable, residuals, and related StatsAPI accessors.

We can use RegressionTables.jl to get regression tables. The examples below follow the current render = AsciiTable() / HtmlTable() interface documented by the package.

regtable(
    m1,
    m2,
    m3,
    m4;
    render = AsciiTable(),
    regression_statistics = [Nobs, R2, R2Within],
    estim_decoration = make_estim_decorator([0.01, 0.05, 0.1]),
)

regtable(
    m1,
    m2,
    m3,
    m4;
    render = HtmlTable(),
    file = "base_reg.html",
    regression_statistics = [Nobs, R2, R2Within],
    estim_decoration = make_estim_decorator([0.01, 0.05, 0.1]),
)

Use other features of FixedEffectModels.jl

We can also use other features of FixedEffectModels.jl, such as using GPU for estimation. With current FixedEffectModels.jl versions, use method = :CUDA (or :Metal on supported Apple hardware).

eventreg(df, formula1, rel_varlist1, control_cohort1, cohort1, vcov1, method = :CUDA, double_precision = false)

Another example

Refer to Difference-in-Difference (DiD) | DiD.

# Generate sample data
using Random

units = 30
start = 1
finish = 60
nlvls = 5
time = finish - start + 1
obsv = units * time

df = DataFrame(id = repeat(1:units, inner=time), t = repeat(start:finish, outer=units))

Random.seed!(20211222)

df.Y .= 0 # outcome variable
df.D .= 0 # intervention variable
df.cohort .= repeat([rand(0:nlvls) for i in 1:units], inner=time) # treatment cohort

effect = [rand(2:10) for i in unique(df.cohort)]
first_treat = [rand(start:(finish + 20)) for i in unique(df.cohort)]

df.effect = similar(df.cohort)
df.first_treat = similar(df.cohort, Union{Int64, Missing})

for i in 0:nlvls
    df.effect[df.cohort .== i] .= effect[i+1]
    df.first_treat[df.cohort .== i] .= first_treat[i+1]
end

df.first_treat[df.first_treat .> finish] .= missing


df.D = ifelse.(coalesce.(df.t .>= df.first_treat, false), 1, 0)
df.rel_time .= df.t .- df.first_treat
df.Y = df.id .+ df.t .+ ifelse.(df.D .== 1, df.effect .* df.rel_time, 0) .+ randn(nrow(df))

plot(df.t, df.Y, label=false)

tab1(df,:cohort)

df.never_treat = ismissing.(df.first_treat)

tab1(df,:rel_time)

# We will consider the dynamic effect of union status on income. We first generate these relative time indicators, and leave out the distant leads due to few observations. Implicitly this assumes that effects outside the lead windows are zero.

relmin = abs(minimum(skipmissing(df.rel_time)))
relmax = abs(maximum(skipmissing(df.rel_time)))

for k in relmin:-1:2
    df[!, Symbol("g_$k")] = Int.(coalesce.(df.rel_time .== -k, false))
end

for k in 0:relmax
    df[!, Symbol("g$k")] = Int.(coalesce.(df.rel_time .== k, false))
end

absorb = [:id,:t]

formula1 = term(:Y) ~ sum(fe.(term.(absorb)))

rel_varlist1 = Symbol[]

for i in relmin:-1:2
    push!(rel_varlist1, Symbol("g_"*string(i)))
end

for i in 0:relmax
    push!(rel_varlist1, Symbol("g"*string(i)))
end

control_cohort1 = :never_treat

cohort1 = :first_treat

m5 = eventreg(df, formula1, rel_varlist1, control_cohort1, cohort1)

plot2 = mycoefplot(m5)

Performance

This repository now ships a reproducible benchmark workflow in benchmark/README.md. It generates one synthetic Sun-Abraham style panel, runs the Julia / R / Stata implementations locally, and renders the chart below from the saved timing CSVs.

Event study speed comparison

The current checked-in figure was regenerated on this Windows machine with 10,000 observations (500 units x 20 periods), 24 Julia threads, 24 fixest threads, and a matched Sun-Abraham event window (ref.p = c(-1, -17) on the fixest side). Julia timings exclude first-run JIT compilation by doing one warmup estimation before the timed run.

  • id + t: EventStudyInteracts.jl = 4.83s, fixest = 0.41s, eventstudyinteract = 15.88s
  • id + id1 + id2 + t: EventStudyInteracts.jl = 4.84s, fixest = 0.43s, eventstudyinteract = 16.70s
  • On this checked-in 10k benchmark, EventStudyInteracts.jl is 3.29x faster than Stata for id + t and 3.45x faster than Stata for id + id1 + id2 + t.
  • On this checked-in 10k benchmark, fixest is still the fastest of the three implementations.
  • A larger 100,000-row rerun on the same machine currently gives EventStudyInteracts.jl at 54.99s / 49.48s and fixest at 3.22s / 3.30s, but I am not checking in a new 100k chart until the Stata batch finishes reliably on this machine.

To rerun the benchmark and refresh the figure locally:

./benchmark/run_all.ps1 -Units 5000 -Periods 20

The workflow writes benchmark/results_latest.csv, benchmark/results_latest.md, and benchmark/speed_comparison.svg.

References

GitHub - FixedEffects/FixedEffectModels.jl: Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables

lsun20/EventStudyInteract (github.com)

Difference-in-Difference (DiD) | DiD

fixest

About

This package is a Julia replication of the Stata package eventstudyinteract provided by Sun and Abraham (2021).

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors