Improve the GMM testing by leveraging the SMT#45
Improve the GMM testing by leveraging the SMT#45CB-quakemodel wants to merge 29 commits intomasterfrom
Conversation
5d1219f to
fb86e36
Compare
|
@CB-quakemodel, thanks tremendously for this. This is excellent work, but I have some reservations. Maybe you can explain to me a bit better about some of the choices. First, let me say that I think that the use of large custom classes to store data is a mistake, unless there are specific performance or other reasons to do so. I think it makes the code harder to understand, harder to use in a more ad-hoc way (for example, you can easily import a function that operates on a numpy array, but if instead the function is actually a method of some class, you have to instantiate the whole class to use the function, even if you aren't interested in 90% of the class and/or you have to have a lot of extraneous data on hand to make the class). The classes are also a lot more work to test (for the same set up/tear down reasons). You have to write a lot of specific code to save an instantiated class to a file, whereas saving a dict is easy. You have to dig in hard in the REPL to see all the data and methods, so they are hard to debug. Hamlet used to use a lot more of these, because I was following what was then the standard practice at GEM, and I spent months redoing things to use more standard Python data structures (dicts, dataframes, etc.) instead. There are a few cases that remain for performance reasons or to integrate better with the OQ Engine. So for some of the structure of the PR, I don't think that we want to create a single large class that holds the evaluation data and results. The rest of the evals manage fine with a dictionary that has config, data, results, etc., and it is best to continue to use the same pattern throughout. As we've discussed, I also don't want to require the MBTK as a dependency, even an optional one, if it can be avoided. However there may be real drawbacks to not using the SMT. (I don't think performance is one, as there are generally a few tens of EQs max in a given model with observations--though I haven't run Japan yet...)
Again, thanks a lot. I think this will really help us get to the forefront on whole-model evaluations. Also I'm happy to have a call to discuss if you'd like. |
Expand the initial GMM testing to compute per GMM per TRT the total, inter and intra-event residuals and some overall summary plots too. Addresses #20
I extend the same ContextDB used in the SMT so we can use the SMT's existing capabilities to compute the partitioned random effects residuals. We can also then use the plotting functions to provide some summary plots of the residuals too. Right now it is hardcoded to compute the GRM IMTs (PGA, SA(0.3), SA(0.6) and SA(1.0)).
Examples of the plots are provided here (these are the ones generated in the added unit test for the new residual analysis functions - note that I made a fake ground-motion dataset for each TRT given there is not any available ground-motions for the region covered by the source model used in the SSC testing QA). This is made clear in the sample flatfile to ensure a user does not think this is real metadata for ground-motions in the test SSC area.
I have updated the generation of the residual analysis HTML and the documentation.
This PR will break the current tests because we need the MBTK installed in the same environment as the CI tests create
Example plots