[codex] Fix Windows SIGALRM markdown indexing#22
Merged
Conversation
30de204 to
ab2bafc
Compare
Wrap the POSIX SIGALRM/alarm setup in a single `_read_timeout` context manager that no-ops on platforms (Windows) where those signals are unavailable. This removes the duplicated error-handling branches in `safe_read_text` / `safe_read_bytes` and drops the `_handle_read_error` helper, the unreachable trailing `raise` after it, and the `vars(signal)` + `cast` workarounds previously needed to keep mypy happy on Windows. Also make the new CLI regression test independent of terminal width by pinning COLUMNS so Rich does not wrap the error line off-screen, and annotate `monkeypatch` with `pytest.MonkeyPatch` for consistency with the rest of the suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Windows smoke run (Python 3.11) to the test matrix so that POSIX-only regressions like the SIGALRM markdown-indexing bug fixed in this PR get caught at PR time instead of in production. Ubuntu keeps the full 3.10/3.11/3.12 matrix; Codecov upload stays restricted to the single ubuntu/3.11 job. `fail-fast: false` so a Windows failure does not cancel the Linux matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a macOS smoke run (Python 3.11) alongside the new Windows job. macOS is a primary developer platform for this project, so catching Darwin-specific quirks (BSD libc differences, case-insensitive filesystem, signal behavior) at PR time is worth the extra runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The macOS test job was failing with:
AttributeError: 'sqlite3.Connection' object has no attribute
'enable_load_extension'
librarian's storage layer calls `conn.enable_load_extension(True)` to
load the sqlite-vec extension. That method only exists when CPython was
compiled with `--enable-loadable-sqlite-extensions`. The python.org
build that `actions/setup-python` ships on macOS is compiled without
that flag, so any test path that touches the database fails — 24
unrelated failures, all the same root cause.
Switch the test job to uv-managed Python (python-build-standalone),
which enables loadable sqlite extensions on every platform. This is a
single `python-version` input on `astral-sh/setup-uv`, so it also
removes the now-redundant `actions/setup-python` step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous attempt removed actions/setup-python but uv still resolved the runner's pre-installed Python (/usr/local/bin/python3.11) because the version matched the matrix entry. That Python is the python.org build, compiled without --enable-loadable-sqlite-extensions, so sqlite-vec failed to load and 24 tests crashed on macOS. Add --python-preference=only-managed to the install step so uv downloads a python-build-standalone interpreter instead. Those builds enable loadable sqlite extensions on every platform. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
torresmateo
approved these changes
May 12, 2026
Collaborator
torresmateo
left a comment
There was a problem hiding this comment.
My only question is why are we using 3.11 in some of these?
| include: | ||
| # Smoke-test on Windows to catch POSIX-only regressions (e.g. SIGALRM). | ||
| - os: windows-latest | ||
| python-version: "3.11" |
Spartee
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #21
This fixes Windows Markdown indexing by avoiding unconditional use of POSIX-only
signal.SIGALRM/signal.alarmin parser file reads. It also makeslibr add <directory>return a non-zero exit code when the directory indexing result contains errors, so automation can detect partial indexing failures.Root Cause
safe_read_text()installed a timeout withsignal.SIGALRMbefore reading every text file. Windows Python does not exposeSIGALRM, so Markdown parsing failed before the file content could be read. Directory indexing collected that failure inerrors, but the CLI still printed a success panel and exited with status code 0.Implementation Approach
librarian.processing.parsers.base.signal.alarmtimeout path on platforms that provide it.SIGALRM/alarmare unavailable.safe_read_bytes()because it used the same POSIX-only mechanism.libr add <directory>to print an error status and raisetyper.Exit(1)when indexing returns errors.Tests Added
safe_read_text()andsafe_read_bytes()withsignal.SIGALRMremoved from the runtime.libr add <directory>exits non-zero when indexing reports errors.Validation
uv run --extra dev pytest-> 75 passed, 8 skipped.uv run --extra dev ruff check --diff librarian/processing/parsers/base.py librarian/cli.py tests/test_file_read_errors.py tests/test_cli.py-> no diffs.uv run --extra dev mypy librarian/processing/parsers/base.py-> success.signal.SIGALRMat runtime, patched a fake embedder, and confirmedindex_directory_to_library()indexed one Markdown file withindexed: 1anderrors: [].I also tried a raw CLI smoke with the real local embedding model in an isolated temporary HOME. It reached the indexing pipeline and correctly exited non-zero on failure, but the run was blocked by Hugging Face model cache setup in the temporary HOME rather than by Markdown parsing.
Remaining Limitations
On platforms without
SIGALRM, file reads no longer have the POSIX alarm timeout. This preserves Windows compatibility without introducing a broader threaded/process timeout refactor.