This package is a Julia replication of the Stata package eventstudyinteract provided by Sun and Abraham (2021).
The estimation process is identical to that of eventstudyinteract, and this package used FixedEffectModels.jl to implement the Stata command reghdfe functionality.
As introduced in the Stata package eventstudyinteract provided by Sun and Abraham (2021), the function of this package is:
eventstudyinteractimplements the interaction weighted (IW) estimator and constructs pointwise confidence interval for the estimation of dynamic treatment effects. To estimate the dynamic effects of an absorbing treatment, researchers often use two-way fixed effects (TWFE) regressions that include leads and lags of the treatment (event study specification). Units are categorized into different cohorts based on their initial treatment timing. Sun and Abraham (2020) proposes this estimator as an alternative to the TWFE regression in the presence of treatment effects heterogeneous across cohorts. Under treatment effects heterogeneity, the TWFE regression can result in estimates with uninterpretable weights, which can be assessed by the Stata module eventstudyweights. The IW estimator is implemented in three steps. First, estimate the interacted regression with reghdfe, where the interactions are between relative time indicators and cohort indicators. Second, estimate the cohort shares underlying each relative time. Third, take the weighted average of estimates from the first step, with weights set to the estimated cohort shares.
-
Julia is faster than Stata, and
FixedEffectModels.jlis much faster than reghdfe. -
I am a beginner in programming and the code is a complete translation of the Stata package
eventstudyinteract. I used newbing to help me with programming, and it is very helpful for beginners. -
I used
FixedEffectModels.jldirectly for estimation and I am grateful to the author of this package. Other parts of this package, such as reporting of estimation results, also refer toFixedEffectModels.jl. -
The Stata package
eventstudyinteractis very flexible, and one important feature is that it can compare event study estimates for subsamples, i.e. perform heterogeneous estimation.fixestcan estimate Sun-Abraham specifications throughsunab(...), but I still find the workflow design of the Stata package more user-friendly for these heterogeneous-estimation use cases. -
This package fully supports the features of
FixedEffectModels.jl, such as using GPU for estimation. The syntax of this package is similar to that ofFixedEffectModels.jl.
EventStudyInteracts.jl is registered in the General registry.
The package is currently maintained against:
Julia 1.10+FixedEffectModels.jl 1.13Vcov.jl 0.8
You can install it at the REPL with ] add EventStudyInteracts.
For local development in this repository, run julia --project=. -e "using Pkg; Pkg.instantiate()".
For formatted regression tables, pair this package with RegressionTables.jl. The examples below use the current v0.7.x API; recent releases are listed on the official RegressionTables.jl releases page.
Before pushing a maintenance update, use the following local checks:
.\scripts\run_pkg_test.ps1
julia --project=. .\scripts\single_cohort_equivalence.jlWhat these two commands cover:
scripts/run_pkg_test.ps1bootstraps a local depot for this repository, rebuilds the local test manifest if needed, and runs the fullPkg.test()suite.scripts/single_cohort_equivalence.jlgenerates a synthetic panel with one treated cohort and verifies thateventreg(...)matches a directFixedEffectModels.reg(...)event-study regression exactly, including period-by-period standard errors and the aggregatedATE(g0:g3).
A change is ready to push to GitHub when all of the following are true:
Pkg.test()passes locally.- The single-cohort equivalence check reports zero differences.
- The checked-in Stata reference comparison still passes.
- README examples and benchmark claims are consistent with the current code.
This repository includes separate maintenance automation in addition to CI:
CIruns package tests onmain.CompatHelperwatches dependency compat updates.TagBothandles tags and releases.Maintenanceruns weekly and checks whetherFixedEffectModels.jlhas a newer upstream release, runsJulia 1.10tests, runs aJuliastable import smoke test, and writes a maintenance report.- The main test suite also includes a
RegressionTables.jlsmoke test so table output stays visible during dependency upgrades.
The maintenance check compares the nlswork example against the checked-in Stata baseline in reference/nlswork_stata.toml. Any additional *.toml file placed in reference/ is picked up automatically, so you can add reference/boss_reference.toml to compare against your boss's saved results as well.
If you want CompatHelper to open PRs automatically on GitHub, add the repository secret COMPATHELPER_PRIV.
This repository is maintained with Codex, using the current GPT-5 coding model together with the local Julia, R, and Stata toolchain on this machine.
The default maintenance workflow is:
- Weekly monitoring:
Maintenancechecks upstream dependency drift, especiallyFixedEffectModels.jl, and writes a machine-generated report. - Upgrade trigger: when
FixedEffectModels.jlor another important dependency changes,Codexupdates compat bounds, adapts the package code, and reruns the local validation workflow. - Validation gate: the change is not considered complete until
Pkg.test()passes, theRegressionTables.jlsmoke test passes, and the checked-in Stata reference comparison passes. - Documentation refresh: if behavior, compat, or benchmark results change,
README.md, benchmark artifacts, and reference files are updated in the same maintenance cycle. - Optional external parity: if
reference/boss_reference.tomlis present, the same workflow also checks the package against your boss's saved output.
This workflow is intended to guarantee the following:
EventStudyInteracts.jlremains compatible withJulia 1.10 LTS, with an additional stable-version import smoke check.- The
nlsworkreference example remains numerically close to the checked-in Stata baseline, within the tolerances stored inreference/nlswork_stata.toml. RegressionTables.jlintegration keeps working as part of the main automated test suite.- README performance claims are backed by checked-in scripts and saved benchmark outputs under
benchmark/.
This workflow does not guarantee the following:
- Exact equality for all summary metadata across package versions. Values such as degrees of freedom, adjusted
R2, or iteration counts can change slightly acrossFixedEffectModels.jlreleases even when the core coefficient vector remains aligned. - Equality to any outside result that is not checked into
reference/. - Fully automatic release decisions.
Codexcan implement and validate changes, but versioning, tagging, and registration still follow the normal repository release process.
result = eventreg(
df,
formula,
rel_varlist::Vector{Symbol},
control_cohort::Symbol,
cohort::Symbol,
vcov;
contrasts = contrasts,
weights = weights,
save = save,
method = method,
nthreads = nthreads,
double_precision = double_precision,
tol = tol,
maxiter = maxiter,
drop_singletons = drop_singletons,
progress_bar = progress_bar,
dof_add = dof_add,
subset = subset,
)The syntax is similar to FixedEffectModels.jl, and the current package line is tested against FixedEffectModels.jl 1.13 on Julia 1.10+. Users need to specify rel_varlist::Vector{Symbol}, control_cohort::Symbol, and cohort::Symbol explicitly. The rel_varlist is the list of relative time indicators as you would have included in the canonical two-way fixed effects regression. The rel_varlist must be defined as a Vector{Symbol}.
dof_add is still accepted for backward compatibility, but it is currently ignored when this package is used with FixedEffectModels.jl 1.13+.
rel_varlist = Symbol[:g_4, :g_3, :g_2, :g0, :g1, :g2, :g3]
See illustration for an example of generating these relative time indicators. The relative time indicators should take the value of zero for never treated units. Users should shape their dataset to a long format where each observation is at the unit-time level. See illustration for an example of specifying the syntax. The syntax is similar to FixedEffectModels.jl in specifying fixed effects and the type of standard error reported.
Furthermore, it also requires the user to specify the cohort categories as well as which cohort is the control cohort. Note that Sun and Abraham (2021) only establishes the validity of the IW estimators for balanced panel data without covariates. eventstudyinteract does not impose a balanced panel by dropping units with observations that are missing in any iod. Instead, it evaluates the IW estimators for unbalanced panel data by estimating the interacted regression using all observations.
The control_cohort must be defined as a Symbol, which is a binary variable that corresponds to the control cohort, which can be never-treated units or last-treated units. If using last-treated unit as control cohort, exclude the time periods when the last cohort receives treatment.
The cohort must be defined as a Symbol, which is a categorical variable that corresponds to the initial treatment timing of each unit. If there are units that receive multiple treatments, Sun and Abraham (2021) defines the initial treatment timing to be based on the first treatment. This categorical variable should be set to be missing for never treated units.
When all treated units start treatment in the same period, eventreg(...) should collapse to the same event-study regression you would estimate directly with FixedEffectModels.jl. In that special case there is only one treated cohort, so the IW weights are all equal to one and no extra share-aggregation uncertainty should remain.
This repository checks that case explicitly with scripts/single_cohort_equivalence.jl. The script generates a deterministic panel, estimates the model both ways, and compares every event-time coefficient, every standard error, and the average post-treatment effect ATE(g0:g3) under four covariance estimators. The same logic also runs inside Pkg.test().
To rerun the check locally:
julia --project=. .\scripts\single_cohort_equivalence.jlOn the current checked-in synthetic panel, the maximum absolute differences are:
| Vcov | max abs coef diff | max abs se diff | max abs vcov diff | abs ATE diff | abs ATE se diff |
|---|---|---|---|---|---|
simple |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
robust |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
cluster(:id) |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
cluster(:id, :t) |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
For the two-way clustered case (cluster(:id, :t)), the period-by-period estimates and standard errors are:
| Term | Direct FEM estimate | eventreg estimate |
Direct FEM std. error | eventreg std. error |
|---|---|---|---|---|
g_3 |
-0.71994246 |
-0.71994246 |
0.34987645 |
0.34987645 |
g_2 |
0.067715152 |
0.067715152 |
0.32766335 |
0.32766335 |
g0 |
-0.29595699 |
-0.29595699 |
0.33389899 |
0.33389899 |
g1 |
2.2834434 |
2.2834434 |
0.33145837 |
0.33145837 |
g2 |
2.094273 |
2.094273 |
0.34614113 |
0.34614113 |
g3 |
2.6978103 |
2.6978103 |
0.33474164 |
0.33474164 |
ATE(g0:g3) |
1.6948924 |
1.6948924 |
0.33402829 |
0.33402829 |
This equivalence is a sanity check for the single-cohort edge case only. With multiple treated cohorts, a direct TWFE-style event-study regression and the Sun-Abraham IW estimator generally target different objects, which is exactly why this package exists.
- You can download the
nlswork.dtadata from the repository, which is the example data used byeventstudyinteract, and the method of generating variables is the same as that ofeventstudyinteract.
using DataFrames
using FixedEffectModels
using EventStudyInteracts
using ReadStatTables
using RegressionTables
using CUDA
using Plots
# Load the 1968 extract of the National Longitudinal Survey of Young Women and Mature Women.
df = DataFrame(readstat(joinpath("dataset", "nlswork.dta")))
# Code the cohort categorical variable based on when the individual first joined the union, which will be inputted in cohort(varname).
df.union_year = ifelse.(coalesce.(df.union, false) .== 1, df.year, missing)
# This is really frustrating. In Stata, the simple code bysort idcode: egen first_union = min(union_year) can achieve this. Why is Julia so complicated?
function min_skipmissing(x)
v = collect(skipmissing(x))
return isempty(v) ? missing : minimum(v)
end
transform!(groupby(df, :idcode), :union_year => min_skipmissing => :first_union)
select!(df, Not(:union_year))
# Code the relative time categorical variable.
df.ry = df.year - df.first_union
# For the first example, we take the control cohort to be individuals that never unionized.
df.never_union = ismissing.(df.first_union)
# Check if there is a sufficient number of treated units for each relative time. With very few units it might be better to bin the relative times and assume constant treatment effects within the bin.
function tab1(df::DataFrame, catevar::Symbol)
gdf = groupby(df, catevar)
result = combine(gdf, nrow => :freq)
result.percent = result.freq / nrow(df) * 100
sort!(result, catevar)
result.cum = cumsum(result.percent)
return result
end
tab1(df,:ry)
# We will consider the dynamic effect of union status on income. We first generate these relative time indicators, and leave out the distant leads due to few observations. Implicitly this assumes that effects outside the lead windows are zero.
relmin = abs(minimum(skipmissing(df.ry)))
relmax = abs(maximum(skipmissing(df.ry)))
for k in relmin:-1:2
df[!, Symbol("g_$k")] = Int.(coalesce.(df.ry .== -k, false))
end
for k in 0:relmax
df[!, Symbol("g$k")] = Int.(coalesce.(df.ry .== k, false))
end
absorb = [:idcode,:year]
formula1 = term(:ln_wage) ~ term(:south) + sum(fe.(term.(absorb)))
# To test whether the package can run successfully when there are collinear variables.
df.g_19 .= 0
# The rel_varlist must be defined as a Vector{Symbol}. In the example below, when defining an empty Vector, you cannot use rel_varlist1 = [].
rel_varlist1 = Symbol[:g_4, :g_5]
for i in relmin:-1:2
push!(rel_varlist1, Symbol("g_"*string(i)))
end
for i in 0:relmax
push!(rel_varlist1, Symbol("g"*string(i)))
end
control_cohort1 = :never_union
cohort1 = :first_union
vcov1 = Vcov.cluster(:idcode)
m1 = eventreg(df, formula1, rel_varlist1, control_cohort1, cohort1, vcov1)
# EventStudyInteract IW estimator
# ========================================================================
# Number of obs: 27979 Degrees of freedom: 166
# R2: 0.666 R2 Adjusted: 0.664
# F-Stat: 2.81691 p-value: 0.000
# R2 within: 0.030 Iterations: 11
# ========================================================================
# ln_wage | Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95%
# ------------------------------------------------------------------------
# g_19 | 0.0 0.0 NaN NaN 0.0 0.0
# g_18 | -0.0609283 0.0513893 -1.18562 0.236 -0.161679 0.0398221
# g_17 | -0.0489819 0.0432146 -1.13346 0.257 -0.133706 0.0357418
# g_16 | -0.0680484 0.0495311 -1.37385 0.170 -0.165156 0.0290589
# g_15 | -0.135218 0.0448805 -3.01284 0.003 -0.223208 -0.0472281
# g_14 | -0.103295 0.0443376 -2.32975 0.020 -0.190221 -0.01637
# g_13 | -0.0684827 0.0416284 -1.6451 0.100 -0.150097 0.0131311
# g_12 | -0.132015 0.0385567 -3.42392 0.001 -0.207606 -0.0564232
# g_11 | -0.15239 0.0376567 -4.04683 0.000 -0.226217 -0.078563
# g_10 | -0.11842 0.0291013 -4.06925 0.000 -0.175474 -0.0613663
# g_9 | -0.172996 0.0330077 -5.24109 0.000 -0.237709 -0.108284
# g_8 | -0.116192 0.030757 -3.77773 0.000 -0.176492 -0.0558915
# g_7 | -0.140938 0.0287493 -4.90233 0.000 -0.197302 -0.0845745
# g_6 | -0.133223 0.0369878 -3.60181 0.000 -0.205739 -0.0607073
# g_5 | -0.131416 0.0238826 -5.50261 0.000 -0.178239 -0.0845938
# g_4 | -0.150598 0.0330246 -4.56016 0.000 -0.215344 -0.0858518
# g_3 | -0.0559142 0.0202356 -2.76316 0.006 -0.0955869 -0.0162416
# g_2 | -0.0565837 0.0191366 -2.95683 0.003 -0.0941016 -0.0190657
# g0 | 0.0515795 0.0146284 3.52598 0.000 0.0229 0.0802591
# g1 | 0.0961542 0.0194692 4.93879 0.000 0.0579842 0.134324
# g2 | 0.0677024 0.0182556 3.70858 0.000 0.0319116 0.103493
# g3 | 0.0663372 0.018431 3.59923 0.000 0.0302027 0.102472
# g4 | 0.132223 0.0276378 4.78414 0.000 0.0780381 0.186408
# g5 | 0.0565864 0.0182183 3.10603 0.002 0.0208689 0.0923039
# g6 | 0.117224 0.0215077 5.4503 0.000 0.0750571 0.15939
# g7 | 0.072123 0.0221207 3.26042 0.001 0.0287545 0.115491
# g8 | 0.0666257 0.0193959 3.43504 0.001 0.0285993 0.104652
# g9 | 0.0929835 0.0349948 2.65707 0.008 0.024375 0.161592
# g10 | 0.0675955 0.0237323 2.84825 0.004 0.0210676 0.114124
# g11 | 0.106953 0.0212674 5.02896 0.000 0.0652575 0.148649
# g12 | 0.0647985 0.0365021 1.7752 0.076 -0.00676515 0.136362
# g13 | 0.134787 0.0412699 3.26599 0.001 0.0538762 0.215698
# g14 | 0.155875 0.0501801 3.10631 0.002 0.0574951 0.254255
# g15 | 0.0665542 0.0426601 1.56011 0.119 -0.0170823 0.150191
# g16 | 0.181412 0.0511286 3.54815 0.000 0.0811727 0.281651
# g17 | 0.0233864 0.0438817 0.532943 0.594 -0.0626452 0.109418
# g18 | -0.0477967 0.0750543 -0.636828 0.524 -0.194943 0.0993498
# ========================================================================The repository maintenance workflow checks this example automatically against the checked-in Stata reference. The coefficient vector is expected to match very closely; summary metadata such as degrees of freedom, adjusted R2, or iteration counts can move slightly across FixedEffectModels.jl versions.
- Event study plots. We may define a
coefplotfunction for an event study plot.
function mycoefplot(m)
n = coefnames(m)
vals = coef(m)
errors = stderror(m)
Plots.scatter(
n,
vals,
legend = false,
yerror = 1.96 .* errors,
#title = "Coefficient plot",
xtickfont = font(10, "Times"),
ytickfont = font(10, "Times"),
titlefont = font(14, "Times"),
# markersize = 2
)
Plots.vline!([3.5,],linestyle = :dash,linecolor=:red)
Plots.hline!([0,],linestyle = :dash,linecolor=:black,linewidth=:2)
end
plot1 = mycoefplot(m1)- Binning. Pre-treatment effects seem relatively constant, which might suggest binning the many leads. TODO: current implementation of bins does not follow Sun and Abraham (2020) exactly due to coding challenge. But it is valid if effects in the bin are constant for each cohort.
df.g_l4 = Int.(coalesce.(df.ry .<= -4, false))
rel_varlist2 = Symbol[:g_l4]
for i in 3:-1:2
push!(rel_varlist2, Symbol("g_"*string(i)))
end
for i in 0:18
push!(rel_varlist2, Symbol("g"*string(i)))
end
m2 = eventreg(df, formula1, rel_varlist2, control_cohort1, cohort1, vcov1)- Using the last treated as the control cohort. Alternatively, we can take the control cohort to be individuals that were unionized last.
df.last_union = Int.(coalesce.(df.first_union .== 88, false))
control_cohort2 = :last_union
# If using the last-treated cohort as the control, be sure to restrict the analysis sample to exclude the never-treated (if any) and to be before the treated periods for the last-treated cohort.
m3 = eventreg(df[(.!ismissing.(df.first_union)) .& (df.year .< 88), :], formula1, rel_varlist2, control_cohort1, cohort1, vcov1)- Aggregating event study estimates. In Stata one can use
lincomlooks for coefficients and variance covariance matrix stored in e(b) and e(V). There is no package likelincomin Julia. I wrote another packageLinComs.jlto do this. You can refer to the following code after estimating the results.
rel_varlist1[20:38]
expr = Expr(:call, :/, Expr(:call, :+, rel_varlist1...), length(rel_varlist1))
# Or,
expr = :((g_20 + g_19 + g_18 + g_17 + g_16 + g_15 + g_14 + g_13 + g_12 + g_11 + g_10 + g_9 + g_8 + g_7 + g_6 + g_5 + g_4 + g_3 + g_2 + g0 + g1 + g2 + g3 + g4 + g5 + g6 + g7 + g8 + g9 + g10 + g11 + g12 + g13 + g14 + g15 + g16 + g17 + g18) / 38)
# And then,
lincom(m1,expr)
# Returns,
lincom
==================================================================================
ln_wage | Estimate Std.Error t value Pr(>|t|) Lower 95% Upper 95%
----------------------------------------------------------------------------------
Linear Combination | -0.00369359 0.0131072 -0.281798 0.778 -0.0293907 0.0220035
==================================================================================The results obtained from LinComs.jl can also be used with RegressionTables.jl to generate regression tables.
- Compare event study estimates for subsamples. Suppose we want to compare the average effect over the first five years of joining the union between college graduates and non-college graduates. We can first estimate their separate effects by interacting the relative time indicators with icator of college graduates:
for k in 18:-1:2
df[!, Symbol("g_", k, "_collgrad0")] = ifelse.(coalesce.(df.ry .== -k, false) .& coalesce.(df.collgrad .== 0, false), 1, 0)
end
for k in 0:18
df[!, Symbol("g", k, "_collgrad0")] = ifelse.(coalesce.(df.ry .== k, false) .& coalesce.(df.collgrad .== 0, false), 1, 0)
end
df.g_l4_collgrad0 = ifelse.(coalesce.(df.ry .<= -4, false) .& coalesce.(df.collgrad .== 0, false), 1, 0)
for k in 18:-1:2
df[!, Symbol("g_", k, "_collgrad1")] = ifelse.(coalesce.(df.ry .== -k, false) .& coalesce.(df.collgrad .== 1, false), 1, 0)
end
for k in 0:18
df[!, Symbol("g", k, "_collgrad1")] = ifelse.(coalesce.(df.ry .== k, false) .& coalesce.(df.collgrad .== 1, false), 1, 0)
end
df.g_l4_collgrad1 = ifelse.(coalesce.(df.ry .<= -4, false) .& coalesce.(df.collgrad .== 1, false), 1, 0)
rel_varlist3 = Symbol[:g_l4_collgrad0]
for i in 3:-1:2
push!(rel_varlist3, Symbol("g_"*string(i)*"_collgrad0"))
end
for i in 0:18
push!(rel_varlist3, Symbol("g"*string(i)*"_collgrad0"))
end
push!(rel_varlist3, :g_l4_collgrad1)
for i in 3:-1:2
push!(rel_varlist3, Symbol("g_"*string(i)*"_collgrad1"))
end
for i in 0:18
push!(rel_varlist3, Symbol("g"*string(i)*"_collgrad1"))
end
m4 = eventreg(df, formula1, rel_varlist3, control_cohort1, cohort1, vcov1)eventreg returns a lightweight regression-model object with the coefficient vector, covariance matrix, estimation-sample indicator, and summary statistics. It supports coef, coefnames, vcov, confint, coeftable, residuals, and related StatsAPI accessors.
We can use RegressionTables.jl to get regression tables. The examples below follow the current render = AsciiTable() / HtmlTable() interface documented by the package.
regtable(
m1,
m2,
m3,
m4;
render = AsciiTable(),
regression_statistics = [Nobs, R2, R2Within],
estim_decoration = make_estim_decorator([0.01, 0.05, 0.1]),
)
regtable(
m1,
m2,
m3,
m4;
render = HtmlTable(),
file = "base_reg.html",
regression_statistics = [Nobs, R2, R2Within],
estim_decoration = make_estim_decorator([0.01, 0.05, 0.1]),
)We can also use other features of FixedEffectModels.jl, such as using GPU for estimation. With current FixedEffectModels.jl versions, use method = :CUDA (or :Metal on supported Apple hardware).
eventreg(df, formula1, rel_varlist1, control_cohort1, cohort1, vcov1, method = :CUDA, double_precision = false)Refer to Difference-in-Difference (DiD) | DiD.
# Generate sample data
using Random
units = 30
start = 1
finish = 60
nlvls = 5
time = finish - start + 1
obsv = units * time
df = DataFrame(id = repeat(1:units, inner=time), t = repeat(start:finish, outer=units))
Random.seed!(20211222)
df.Y .= 0 # outcome variable
df.D .= 0 # intervention variable
df.cohort .= repeat([rand(0:nlvls) for i in 1:units], inner=time) # treatment cohort
effect = [rand(2:10) for i in unique(df.cohort)]
first_treat = [rand(start:(finish + 20)) for i in unique(df.cohort)]
df.effect = similar(df.cohort)
df.first_treat = similar(df.cohort, Union{Int64, Missing})
for i in 0:nlvls
df.effect[df.cohort .== i] .= effect[i+1]
df.first_treat[df.cohort .== i] .= first_treat[i+1]
end
df.first_treat[df.first_treat .> finish] .= missing
df.D = ifelse.(coalesce.(df.t .>= df.first_treat, false), 1, 0)
df.rel_time .= df.t .- df.first_treat
df.Y = df.id .+ df.t .+ ifelse.(df.D .== 1, df.effect .* df.rel_time, 0) .+ randn(nrow(df))
plot(df.t, df.Y, label=false)
tab1(df,:cohort)
df.never_treat = ismissing.(df.first_treat)
tab1(df,:rel_time)
# We will consider the dynamic effect of union status on income. We first generate these relative time indicators, and leave out the distant leads due to few observations. Implicitly this assumes that effects outside the lead windows are zero.
relmin = abs(minimum(skipmissing(df.rel_time)))
relmax = abs(maximum(skipmissing(df.rel_time)))
for k in relmin:-1:2
df[!, Symbol("g_$k")] = Int.(coalesce.(df.rel_time .== -k, false))
end
for k in 0:relmax
df[!, Symbol("g$k")] = Int.(coalesce.(df.rel_time .== k, false))
end
absorb = [:id,:t]
formula1 = term(:Y) ~ sum(fe.(term.(absorb)))
rel_varlist1 = Symbol[]
for i in relmin:-1:2
push!(rel_varlist1, Symbol("g_"*string(i)))
end
for i in 0:relmax
push!(rel_varlist1, Symbol("g"*string(i)))
end
control_cohort1 = :never_treat
cohort1 = :first_treat
m5 = eventreg(df, formula1, rel_varlist1, control_cohort1, cohort1)
plot2 = mycoefplot(m5)This repository now ships a reproducible benchmark workflow in benchmark/README.md. It generates one synthetic Sun-Abraham style panel, runs the Julia / R / Stata implementations locally, and renders the chart below from the saved timing CSVs.
The current checked-in figure was regenerated on this Windows machine with 10,000 observations (500 units x 20 periods), 24 Julia threads, 24 fixest threads, and a matched Sun-Abraham event window (ref.p = c(-1, -17) on the fixest side). Julia timings exclude first-run JIT compilation by doing one warmup estimation before the timed run.
id + t:EventStudyInteracts.jl=4.83s,fixest=0.41s,eventstudyinteract=15.88sid + id1 + id2 + t:EventStudyInteracts.jl=4.84s,fixest=0.43s,eventstudyinteract=16.70s- On this checked-in 10k benchmark,
EventStudyInteracts.jlis3.29xfaster than Stata forid + tand3.45xfaster than Stata forid + id1 + id2 + t. - On this checked-in 10k benchmark,
fixestis still the fastest of the three implementations. - A larger
100,000-row rerun on the same machine currently givesEventStudyInteracts.jlat54.99s / 49.48sandfixestat3.22s / 3.30s, but I am not checking in a new 100k chart until the Stata batch finishes reliably on this machine.
To rerun the benchmark and refresh the figure locally:
./benchmark/run_all.ps1 -Units 5000 -Periods 20The workflow writes benchmark/results_latest.csv, benchmark/results_latest.md, and benchmark/speed_comparison.svg.
lsun20/EventStudyInteract (github.com)