Skip to content

Skip/Speed up tests, resolve most open issues#67

Closed
mdavis-xyz wants to merge 30 commits into
UNSW-CEEM:masterfrom
mdavis-xyz:matt-v2
Closed

Skip/Speed up tests, resolve most open issues#67
mdavis-xyz wants to merge 30 commits into
UNSW-CEEM:masterfrom
mdavis-xyz:matt-v2

Conversation

@mdavis-xyz
Copy link
Copy Markdown
Contributor

@mdavis-xyz mdavis-xyz commented Apr 8, 2026

I've had some issues open for a while. As discussed, it's hard to merge them confidently, given that the tests take half a day, and some functionality has broken due to changes to nemweb.

So I've worked to streamline the tests and the code itself. Now a test with an empty cache takes 2h30m. It's 1h40m with a full cache.

Some of the tests, I just skipped. I normally hate it when people do that. Although for now, I think it's worth doing temporarily, so that we can check all these other bug fixes and feature additions.

This fixes:

And I include some currently open pull requests:

I know it's bad style to combine so many unrelated issues into one pull request. However given that the master branch is not easily buildable, some of these do work together. Look through this pull request one commit at a time, if you want to see review each change separately.

Aside from the tests now skipped with @unittest.skip(), all unit tests pass.

I increased the minimum Python version to 3.10, from 3.9, because uv complained about a dependency. 3.9 is now deprecated, so I think this is ok.

@mdavis-xyz mdavis-xyz changed the title Skip/Speed up tests, resolve all open issues Skip/Speed up tests, resolve most open issues Apr 8, 2026
@nick-gorman
Copy link
Copy Markdown
Member

@mdavis-xyz thank you for the PR (and all the other issues and PRs).

This has spurred me into properly fixing the test suite so it can run in seconds not hours, and then setup cicd. Once that is done we can rebase this branch to include the fixed and complete tests. Then push on and merge.

Thanks for your patience!

@nick-gorman
Copy link
Copy Markdown
Member

@mdavis-xyz I've created a bit of a headache by changing the test frame work while this PR is inflight. To save you the messy rebase and to expedite getting this changes on to main I'm going to go ahead and cherry pick, PR, and review the feature changes one at time. Any that I'm unsure about I'll leave as open PRs where we can continue to iterate/discuss.

@nick-gorman
Copy link
Copy Markdown
Member

Heads-up on PR #67

Hey Matt — thanks again for PR #67. We've broken it into ~10 smaller, individually reviewable PRs against master, all of which are now merged. Wanted to give you a quick run-down of how your work flowed through, and (more importantly) the spots where we made calls that deviate from your original. If any of these look wrong on a post-merge read-through, please flag — happy to revisit any of them.

Hopefully, with better testing setup we can have a more sane development and review process going forward :)
(I mean hopefully I will be more responsive and you won't feel forced into massive PRs!)

What landed (in order)

Calls we made that deviate from your code

  • Uniform Chrome 130 UA, not Chrome 80 default + Chrome 120 override for aemo.com.au only. Your per-domain branching works; we went uniform-current across all hosts for simplicity. nemweb doesn't UA-filter so it's harmless there.
  • download_xldownload_xlsx, not download_xml. The file is .xlsx, not XML. Same rename intent, different name.
  • Dropped the except KeyboardInterrupt: raise handlers in downloader.py + gui.py. Reasoning: KeyboardInterrupt is a sibling of Exception in Py3, so except Exception: doesn't catch KI. Your except KI: raise pattern is structurally a no-op against an except Exception: above it — but a bare except: would still swallow it. We found the real cause: a bare except: in static_table (data_fetch_methods.py:301) and custom_tables.py:performance_at_nodal_peak. Narrowed both to except Exception: so KI propagates naturally. If we've misread your design intent here, please flag — handlers can go back in.
  • Flipped keep_csv default from True to False (PR Stream zip downloads, add keep_zip, flip raw-retention defaults to False #87). Your 7753648 had keep_csv=True so downstream CSV consumers got the file without opt-in. Nick asked to flip it for a leaner default cache + symmetry with the new keep_zip=False default. Your rationale still applies — downstream consumers just opt in explicitly now.
  • keep_zip is a new flag, not an automatic always-keep. Your streaming work in 2a5812f left zips on disk after extraction by default (intentional, per Option to keep .zip not csv #56). We turned that into an opt-in keep_zip=True flag so the default cache stays lean; Option to keep .zip not csv #56's slow-internet use case is preserved by setting keep_zip=True.
  • download_to_path / download_to_dir now return a downloaded boolean / (path, downloaded) tuple. Needed so keep_zip=False cleanup only touches zips this call actually wrote — safe for concurrent same-cache use without filesystem-snapshot diffing. (Also caught a latent bug: your download_to_dir accepted force_redo but never forwarded it.)
  • _pre_check_file_is_missing raises requests.HTTPError(404) instead of ValueError (PR Cache parent directory HTML for fast missing-file pre-check #88). Matches the contract PR Adopt requests.Session, bump UA to Chrome 130, raise_for_status everywhere #85 set up — every download function surfaces a 404 as HTTPError. Existing 404-warning callers (run, run_bid_tables, etc) keep working unchanged whether the answer came from the cache or the wire.
  • Dropped the assert file_url.startswith("https://") in the pre-check. NEMOSIS's offline test fixtures legitimately use http:// against the local mock server. Pre-check now returns None for any URL outside the configured nemweb prefix list — same effect for real callers.
  • Kept underscore-aliased module imports in data_fetch_methods.py rather than switching to direct function imports. Project convention throughout the file. Did take your direct _filter_on_column_value import since both call sites changed.
  • Did not re-add your tests/test_data_fetch_methods.py additions. PR Port test suite to offline fixture-based architecture #72 deleted the whole file in favour of offline-fixture tests. Your test_dt_not_string cases were re-implemented in tests/end_to_end_table_tests/test_datetime_inputs.py with the offline fixture, plus a regression test for the bug below.

Bugs in your code we fixed during cherry-pick

(Not blockers — just want you to see them so you're not surprised.)

What we'd love your eye on

  1. The KI / bare-except analysis above. If our reading of Py3 exception semantics is off, the handlers need to go back in.
  2. Anything else where you'd have made a different call. Everything's landed but nothing's set in stone — happy to revert or revisit any deviation if you see it differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants