Support model methods & standalone functions for models with external C++ (user_header)#1197
Support model methods & standalone functions for models with external C++ (user_header)#1197ssp3nc3r wants to merge 1 commit into
Conversation
…ders `compile_model_methods = TRUE` and `compile_standalone = TRUE` previously failed for any model using an external C++ `user_header`, for two reasons: 1. The standalone / model-methods C++ was generated by calling stanc directly (via an argument vector, no shell) using the same quoted `--name='foo_model'` flag that is built for the make-based executable build. With no shell to strip them, the quotes became part of the model name, producing the namespace `'foo_model'_namespace`. External C++ functions, which are defined in `foo_model_namespace`, were therefore not visible and compilation failed with "use of undeclared identifier". 2. `rcpp_source_stan()` did not `-include` the user header when compiling the model methods via `Rcpp::sourceCpp()`, unlike CmdStan's makefile which adds `-include $(USER_HEADER)`. Pass an unquoted form of the stanc options to `get_standalone_hpp()`, and force-include the user header in `rcpp_source_stan()`. With both fixes, `grad_log_prob()` on a model with a hand-written C++ likelihood matches the pure-Stan autodiff version exactly. Adds a regression test.
62ffb92 to
c4534ec
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1197 +/- ##
==========================================
+ Coverage 91.10% 91.16% +0.06%
==========================================
Files 15 15
Lines 6125 6138 +13
==========================================
+ Hits 5580 5596 +16
+ Misses 545 542 -3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Thanks for the PR! Here are some initial review comments after a first quick look.
Also, was any of this written by AI? Either way is fine, I'm just asking because we recently adopted a new AI policy. The policy allows using AI tools but "if AI tools were used, the contributor additionally affirms that they have reviewed and understood the code, and can explain and defend their changes during the review process." Eventually we'll have some more info about this that comes up when opening a PR.
| # can be rebuilt lazily in the current session if they are later requested. | ||
| # This avoids an error when a CmdStanModel object with compiled bindings is | ||
| # loaded from an older session: https://github.com/stan-dev/cmdstanr/issues/1157 | ||
| drop_stale_model_methods <- function(env) { |
There was a problem hiding this comment.
user_header_ will be dropped here, is that desirable? the PR stores the header, but saved/reloaded models keep only hpp_code_.
To see the potential issue: in a fresh R process save a CmdStanModel with compile_model_methods = TRUE and user_header. Then reload it, fit, then
call fit$log_prob(). There's an error about undeclared identifier 'make_odds'.
Should we preserve user_header_ in drop_stale_model_methods()? Is that the desired behavior? If so, we should probably also add a regression test for saved/reloaded external-header models.
| if (global) { | ||
| rcpp_source_stan(mod_stan_funs, globalenv(), verbose) |
There was a problem hiding this comment.
Here I think we might lose user_header_ when global=TRUE. rcpp_source_stan() reads env$user_header_, but when global=TRUE we pass globalenv() instead of the function env that has the header.
I haven't reproduced this myself, but this should do it:
In a fresh R session with an external standalone function, try mod$expose_functions() (should work) and then try mod$expose_functions(global = TRUE) (probably fails with no function definition in the namespace).
To fix this I think we'd either need to (a) separate the target environment from the metadata/header source, or (b) compile with the model function env and copy exports to global afterward. Option (b) seems like a cleaner approach, so I'd lean towards that unless you have any other ideas.
| * `compile_model_methods = TRUE` and `compile_standalone = TRUE` now work for | ||
| models that use an external C++ `user_header`. The standalone / model-methods |
There was a problem hiding this comment.
I think we need to test expose_functions (with global = FALSE and global = TRUE) to be able to fully claim compile_standalone = TRUE / expose_functions() support
|
Thanks for the review help. I wrote it but I'm sure I used AI to figure out stuff. "if AI tools were used, the contributor additionally affirms that they have reviewed and understood the code, and can explain and defend their changes during the review process." I affirm. On the other points, I hadn't thought about them because I just wrote it for my own pain point, which was just comparing the forward and reverse for two models (pure Stan's AD vs hand-coded C++, hence the user header) to check the gradient work before using it, if that makes sense. I've only been using it in that isolated case. I'll need to think through the above. |
|
@andrjohns Do you by any chance have time to take a look at this at some point? I made a few review comments but you're obviously much more familiar with the model methods and expose functions code. |
Yep, happy to take this one! |
Submission Checklist
Summary
compile_model_methods = TRUE(the$log_prob()/$grad_log_prob()/$hessian()methods) andcompile_standalone = TRUE(expose_functions()) currently fail to compile for any model that uses an external C++user_header— even though the very same model samples without issue. This blocks a common workflow: validating a hand-written C++ log density / analytic gradient against the pure-Stan autodiff version viagrad_log_prob().Reproducible example
bernoulli_external.stan(declaresmake_odds, defined in the header):make_odds.hpp(in the model's namespace, per the CmdStan convention):mod$sample()works, but compiling the model methods aborts with:Root cause
Two independent problems in the standalone / model-methods build path:
Mangled namespace. The standalone & model-methods C++ is generated by calling
stancdirectly through an argument vector (no shell) inget_standalone_hpp(), reusing the same--name='foo_model'flag that is built for the make-based executable build. The make path is processed by a shell that strips the quotes; the direct call is not, so the literal quotes become part of the model name, producing the namespace'foo_model'_namespace(x39model...x39_namespaceafter C++ name-mangling). External C++ functions are defined by the user infoo_model_namespace, so they are not visible to the generated code.Header not included.
rcpp_source_stan()does not-includethe user header when compiling the methods viaRcpp::sourceCpp(), unlike CmdStan's makefile, which adds-include $(USER_HEADER).Fix
get_standalone_hpp()calls; the make path keeps the quoted form (so names/paths with spaces are still safe there).self$functions) and force-include it inrcpp_source_stan(), mirroring CmdStan's-include $(USER_HEADER).This also enables
compile_standalone = TRUE/expose_functions()for models with external C++, which sharercpp_source_stan().Verification
For a normal model and an identical model whose single likelihood term is routed through an external C++ function, with the fix
log_prob()andgrad_log_prob()match the pure-Stan autodiff version exactly:Adds a regression test in
tests/testthat/test-model-methods.Rusing the existingbernoulli_externalfixture (fulltest-model-methods.Rpasses locally on macOS / CmdStan 2.39.0).Copyright and Licensing
Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Scott Spencer
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: