Skip to content

fix: retry event watcher blocks after RPC failures#349

Open
victortran0904 wants to merge 4 commits into
entrius:testfrom
victortran0904:codex/retry-event-watcher-failed-blocks
Open

fix: retry event watcher blocks after RPC failures#349
victortran0904 wants to merge 4 commits into
entrius:testfrom
victortran0904:codex/retry-event-watcher-failed-blocks

Conversation

@victortran0904
Copy link
Copy Markdown
Contributor

Summary

  • make process_block() report whether block event retrieval succeeded
  • keep the event watcher cursor at the last successfully processed block when a transient RPC failure occurs
  • add regression coverage for clean sync, mid-window failure, and retry after recovery

Fixes #201

Note: #339 also touches event_watcher.py for state persistence, so this may need a small rebase if that lands first.

Tests

  • uv run pytest tests/test_event_watcher.py -q
  • uv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.py
  • uv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py
  • git diff --check

Copilot AI review requested due to automatic review settings May 20, 2026 06:12
@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 20, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes ContractEventWatcher.sync_to() so it only advances the cursor past blocks whose event retrieval succeeded, preventing silent event loss when transient RPC failures occur (Fixes #201).

Changes:

  • Make process_block() return a success flag and stop the sync loop on retrieval failures.
  • Advance cursor to the last successfully processed block instead of unconditionally to the end of the window.
  • Add regression tests covering clean sync, mid-window failure, and retry after recovery.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
allways/validator/event_watcher.py Track last successfully processed block and halt cursor advancement when block/event retrieval fails.
tests/test_event_watcher.py Add tests to ensure cursor stops before a failed block and retries successfully on a subsequent sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread allways/validator/event_watcher.py
Comment thread allways/validator/event_watcher.py Outdated
Comment thread tests/test_event_watcher.py Outdated
@victortran0904
Copy link
Copy Markdown
Contributor Author

Updated this branch to address the actionable Copilot review items:

  • added an explicit warning when sync stops before the requested block range so operators can see the partial catch-up and retry point
  • added regression coverage for get_events raising and then succeeding on a later sync_to call
  • kept the existing sync_to return contract unchanged

I did not add pruned/missing-block special casing because there does not appear to be a clean existing error taxonomy for that path; string matching provider errors would be brittle.

Verification:

  • uv run pytest tests/test_event_watcher.py -q
  • uv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.py
  • uv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py
  • git diff --check

@victortran0904 victortran0904 force-pushed the codex/retry-event-watcher-failed-blocks branch from ea4391f to 3039b8d Compare May 20, 2026 21:43
@victortran0904
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto the latest test and resolved the event watcher test import conflict. The PR diff is still scoped to allways/validator/event_watcher.py and tests/test_event_watcher.py.

Verification after rebase:

  • rtk uv run pytest tests/test_event_watcher.py -q -> 32 passed
  • rtk uv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.py -> passed
  • rtk uv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py -> passed
  • rtk git diff --check -> passed

@victortran0904 victortran0904 force-pushed the codex/retry-event-watcher-failed-blocks branch from 3039b8d to 48e663b Compare May 21, 2026 04:41
@victortran0904
Copy link
Copy Markdown
Contributor Author

Follow-up pushed in f45965d to address the permanent historical block retry concern.

  • Added classification for permanently unavailable historical block/state errors, including non-archive historical block misses.
  • process_block() still returns False for unrelated/transient RPC failures so sync_to() retries those.
  • Added focused tests for permanent historical failures from both get_block_hash() and get_events().

Validation:

  • rtk proxy env UV_CACHE_DIR=/private/tmp/uv-cache UV_PROJECT_ENVIRONMENT=/private/tmp/allways-pr349-venv-313 uv run --python 3.13 pytest tests/test_event_watcher.py -q -> 61 passed
  • rtk proxy env UV_CACHE_DIR=/private/tmp/uv-cache UV_PROJECT_ENVIRONMENT=/private/tmp/allways-pr349-venv-313 uv run --python 3.13 ruff check allways/validator/event_watcher.py tests/test_event_watcher.py -> passed
  • rtk proxy env UV_CACHE_DIR=/private/tmp/uv-cache UV_PROJECT_ENVIRONMENT=/private/tmp/allways-pr349-venv-313 uv run --python 3.13 ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py -> passed
  • rtk git diff --check -> passed

Note: default uv run selected Python 3.14, which cannot build bittensor-commit-reveal due to PyO3 support limits, so validation was pinned to Python 3.13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[S1] event_watcher silently drops events on transient RPC failure

2 participants