Skip to content

Fix #74: double-encode # in archive URLs (real AEMO files use literal %23)#86

Merged
nick-gorman merged 1 commit into
masterfrom
fix-issue-74-archive-url-double-encode
May 25, 2026
Merged

Fix #74: double-encode # in archive URLs (real AEMO files use literal %23)#86
nick-gorman merged 1 commit into
masterfrom
fix-issue-74-archive-url-double-encode

Conversation

@nick-gorman
Copy link
Copy Markdown
Member

Summary

  • Post-2024-07 PUBLIC_ARCHIVE# monthly MMS archives are stored on nemweb with literal %23 in their on-disk filenames (not #). NEMOSIS sent URLs with single %23 encoding — nemweb decoded that back to #, didn't find the file, returned HTTP 400, and dynamic_data_compiler failed with NoDataToReturn for any post-2024-07 DISPATCHPRICE / DISPATCHLOAD / etc. query on a cold cache.
  • Fix: double-encode #%2523 in downloader.download_unzip_csv so nemweb decodes it once to %23 and finds the real file. Pre-Aug-2024 PUBLIC_DVD_* filenames don't contain #, so the replace is a no-op for the older path.
  • Offline test suite hid the bug because its fixture zips were named with literal # (disagreeing with how real nemweb stores the same files). Renamed all 48 post-2024-07 fixture zips to %23-form so the offline suite now mirrors real-server layout — and would flag a regression in the encoding step. Also applied the matching %2523 fix in tests/fixtures/build.py::http_get so anyone rebuilding fixtures from real AEMO gets working downloads.

Root cause (one paragraph)

AEMO's nemweb URL-decodes one level. To match a real on-disk filename that contains literal %23, the HTTP URL needs %2523 — single %23 decodes to #, finds nothing, 400s. Verified directly against the live server:

$ curl -I '…/PUBLIC_ARCHIVE%23DISPATCHPRICE%23FILE01%23202412010000.zip'
HTTP/1.1 400 Bad Request
$ curl -I '…/PUBLIC_ARCHIVE%2523DISPATCHPRICE%2523FILE01%2523202412010000.zip'
HTTP/1.1 200 OK   (2,140,354 bytes)

Full root-cause investigation is in my issue comment (Option A there).

Test plan

  • All 222 offline tests pass (uv run pytest tests/end_to_end_table_tests/).
  • Live verification: %23 → 400, %2523 → 200 with real zip body, against nemweb.com.au/Data_Archive/.../MMSDM_2024_12/.../PUBLIC_ARCHIVE…DISPATCHPRICE…zip.
  • CI green across the matrix (3 OSes × 5 Pythons).
  • Optional smoke: spot-check dynamic_data_compiler(start_time="2024/08/15 00:00:00", end_time="2024/08/16 00:00:00", table_name="DISPATCHPRICE", …) end-to-end against the live server on a reviewer's machine. The HTTP-level proof above + offline test pass cover the same pipeline.

Fixes #74.

🤖 Generated with Claude Code

Post-2024-07 PUBLIC_ARCHIVE# monthly MMS files are stored on
nemweb.com.au with literal `%23` characters in their filenames (not
`#`). NEMOSIS's `download_unzip_csv` was sending URLs with single `%23`
encoding — nemweb URL-decodes `%23` to `#`, looks for a `#`-named file,
finds none, and returns HTTP 400. The result: dynamic_data_compiler
fails with NoDataToReturn for any post-2024-07 PUBLIC_ARCHIVE# table
(DISPATCHPRICE, DISPATCHLOAD, etc.) on a cold cache.

To match the real filename on disk the URL needs `%2523` so nemweb
decodes it once to `%23`. Verified directly against the live server:
single `%23` returns 400; `%2523` returns 200 with the real zip body.

Three changes here:

1. `src/nemosis/downloader.py::download_unzip_csv` — change the
   `url.replace("#", "%23")` step to `url.replace("#", "%2523")`. This
   is the only place NEMOSIS percent-encodes URLs at fetch time;
   pre-2024-08 PUBLIC_DVD_* filenames don't contain `#` so the replace
   is a no-op for them.

2. `tests/fixtures/build.py` — apply the same fix in `http_get` (which
   the build pipeline uses to fetch fixtures from real AEMO), and
   write the resulting fixtures to disk under their `%23`-form name
   in `mms_fixture_path` so they match nemweb's actual filename layout.

3. Rename the existing 48 PUBLIC_ARCHIVE# fixture zips in
   `tests/fixtures/data/.../MMSDM/2024_*/` and `MMSDM/2025_*/` from
   `…#…zip` to `…%23…zip` via `git mv`. Required because the offline
   test suite stands up a `http.server` over those files and serves
   them at the URL NEMOSIS requests — once NEMOSIS sends `%2523`, the
   server decodes it to `%23` and needs to find a `%23`-named file on
   disk.

Why the offline tests didn't catch this before: the fixture filenames
used literal `#`, which disagreed with how real nemweb stores the same
files. Python's `http.server` URL-decoded `%23` → `#` and happily
served them, so the encoding mismatch was masked. After this change
the fixture filenames mirror the real on-disk layout and the offline
suite would now flag a regression in the URL encoding.

All 222 offline tests pass. Live verification:

  $ curl -I '…/PUBLIC_ARCHIVE%23DISPATCHPRICE%23FILE01%23202412010000.zip'
  HTTP/1.1 400 Bad Request
  $ curl -I '…/PUBLIC_ARCHIVE%2523DISPATCHPRICE%2523FILE01%2523202412010000.zip'
  HTTP/1.1 200 OK  (2,140,354 bytes)

Fixes #74.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nick-gorman nick-gorman force-pushed the fix-issue-74-archive-url-double-encode branch from 4c8bcd5 to fe13d34 Compare May 25, 2026 05:29
@nick-gorman nick-gorman merged commit ca18afd into master May 25, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monthly MMS dynamic table fetches fail from 2024-08 onward (PUBLIC_ARCHIVE#...) while older PUBLIC_DVD_* months still work

1 participant