Skip to content

Add chapter on multiple imputation.#935

Open
abner-hb wants to merge 10 commits intostan-dev:masterfrom
abner-hb:master
Open

Add chapter on multiple imputation.#935
abner-hb wants to merge 10 commits intostan-dev:masterfrom
abner-hb:master

Conversation

@abner-hb
Copy link
Copy Markdown

@abner-hb abner-hb commented Mar 12, 2026

Submission Checklist

  • Builds locally YES
  • New functions marked with <<{ since VERSION }>> YES (no new functions)
  • Declare copyright holder and open-source license: see below

Summary

Add a chapter on multiple imputation to the Stan User's Guide.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Abner Heredia Bustos

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@WardBrian
Copy link
Copy Markdown
Member

Hi @abner-hb, thanks for this! I just made a couple commits to remove the changes to the built docs -- we let our Jenkins jobs build those for us during releases.

I'll ask @bob-carpenter to take a look at the contents when he gets a chance

@WardBrian WardBrian requested a review from bob-carpenter March 12, 2026 15:35
@bob-carpenter
Copy link
Copy Markdown
Member

I can review this.

Copy link
Copy Markdown
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for contributing this.

I am really sorry that I left 73 comments on such a short chapter. It was meant pedagogically and I hope it helps other things you write. It took a couple years of this kind of back-and-forth with Gelman and Vehtari and Goodrich before they stopped marking up everything I wrote this way. Gelman and Vehtari are excellent role models for writing clarity.

If you'd rather not do this, I'm happy to make all the changes I suggested myself.

Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
their precision. So, it is often necessary to account explicitly for
the missing data when fitting a model of interest[^bda].

[^bda]: Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just skimmed Chapter 18. Gelman et al. do not provide a fully Bayesian perspective, he instead uses multiple imputation. The fully Bayesian perspective is given in the User's Guide chapter on missing data.

I would also put this into the main text. Only use footnotes in doc as a last resort.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not fully Bayesian but isn't it partially Bayesian?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair call it "approximately Bayesian," as that's how Gelman talks about anything from maximum likelihood point estimates to VI.

Comment thread src/stan-users-guide/multiple-imputation.qmd
Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
Comment thread src/stan-users-guide/multiple-imputation.qmd
Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
[**NOTE**: I would greatly appreciate any comments or changes to
improve this subsection.]

A full bayesian probability model includes a feedback flow of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bayesian -> Bayesian

The model doesn't have any feedback or flow per se---it's just how joint distributions work.

influence only some parameters in the model. From @plummer:2015,
p. 37:

> Cut models arise in applications with multiple data sources that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments.

p. 37:

> Cut models arise in applications with multiple data sources that
provide information about different parameters in the model [...]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, rather than filling this all in, I'd just write a one-line summary and point to Plummer's article.

Copy link
Copy Markdown
Author

@abner-hb abner-hb Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean summarizing Plummer's quote in one line? Or to summarize this entire section on cut models to one or two lines?

Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
@WardBrian WardBrian requested a review from bob-carpenter April 13, 2026 19:20
Copy link
Copy Markdown
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this again and marked many of the grammatical nitpicky comments as resolved. I think they'd be better the way I was suggesting, but it's more important to actually publish this.

If you don't want to make these changes, @abner-hb, just let me know and I can go and make them myself.

I really appreciate your taking the time to write this despite the huge flurry of comments I've left. I hope they've been more helpful than frustrating, as that was my intention.

Comment thread src/bibtex/all.bib

@article{plummer:2015,
author = {Plummer, Martyn},
title = {Cuts in Bayesian graphical models},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs {B}ayesian in the title or "Bayesian" will get lower-cased.

We generally don't need the url, doi, or publisher, but they're OK to leave in.

And thanks for citing Martyn's paper.

Comment thread src/stan-users-guide/multiple-imputation.qmd
Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
\end{align*}
where $x^{\text{imp}}$ is a data set that includes imputed values of
$x^{\text{mis}}$.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to note to continue line 52 that this depends on a model of $x$, which you typically don't have with a regression, because the inferences for parameters are independent of the model of $x$ when $x$ is fully observed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the new paragraph at line 60 below.

Comment thread src/stan-users-guide/multiple-imputation.qmd
Comment thread src/stan-users-guide/multiple-imputation.qmd Outdated
Comment thread src/stan-users-guide/multiple-imputation.qmd
Comment thread src/stan-users-guide/multiple-imputation.qmd
Comment thread src/stan-users-guide/multiple-imputation.qmd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants