Raise nem_data_model_start_time floor from 2009 to 2015#82
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cherry-picked from #67 (commit
66189a8) — one-line change tosrc/nemosis/defaults.pyraising thenem_data_model_start_timefloor from2009/07/01to2015/01/01, with an inline comment explaining why.Implements the short-term fix recommended in #66.
Why
nem_data_model_start_timeis only consumed by tables withsearch_type = "all"inprocessing_info_maps.py— i.e. cumulative-snapshot tables:PARTICIPANT,DUDETAIL,GENCONDATA,SPDREGIONCONSTRAINT,SPDCONNECTIONPOINTCONSTRAINT,SPDINTERCONNECTORCONSTRAINT,LOSSMODEL,LOSSFACTORMODEL,MNSP_INTERCONNECTOR,INTERCONNECTOR,INTERCONNECTORCONSTRAINT,MARKET_PRICE_THRESHOLDS(12 tables total).For these,
_set_up_dynamic_compilerssetsstart_search = nem_data_model_start_time, then_dynamic_data_fetch_loopiterates monthly from there toend_time. With the floor at 2009-07, the loop probes 72 months of pre-2015 URLs — all of which 404, because AEMO restructured the 2009-2014 archives into one zip per month containing all tables, vs the per-table monthly zips NEMOSIS expects from 2015 onward. As #66 notes: "Nemosis just throws an error, then moves on". Raising the floor skips that doomed scan window.This is a speedup with no incremental data-loss impact — pre-2015 data already doesn't work on master today. The change just makes the failure faster (no minutes of 404 retries) and trims log noise.
What's not affected
search_type = "start_to_end"or"end"— those derivestart_searchfrom the user-suppliedstart_time/end_timedirectly, not from this default.tests/end_to_end_table_tests/test_*.pyalready monkeypatchdefaults.nem_data_model_start_time = "2021/05/01 00:00:00"to narrow scan windows; that path still works.Future work
#66 keeps tracking the long-term aspiration of parsing the whole-of-month zip format AEMO uses for 2009-2014, which would restore access to those years. This PR doesn't close #66 — it implements the short-term mitigation Matt explicitly noted in that issue while leaving the deeper fix open.
No README change
README doesn't currently document a date floor anywhere, and this isn't a behavior change so much as a perf fix for an already-broken case. The inline source comment + #66 is sufficient context for future readers.
Test plan
uv run pytest tests/— 367 pass, 1 skipped, 3 warnings (all pre-existing, unrelated)Related: #66, supersedes the same change from #67.