Add HTTP timeout to read_file network fetch by Spectual · Pull Request #539 · openai/tiktoken

Spectual · 2026-05-09T06:09:49Z

Problem

read_file in tiktoken/load.py calls requests.get(blobpath) without a timeout= argument when fetching tokenizer data over HTTP/HTTPS:

# tiktoken/load.py (current main)
resp = requests.get(blobpath)
resp.raise_for_status()
return resp.content

requests documents this as a footgun in production code:

Without a timeout, your code may hang for minutes or more.
— https://requests.readthedocs.io/en/latest/user/quickstart/#timeouts

In tiktoken's case, the failure mode is encoding_for_model("gpt-4o") (or any other not-yet-cached model) silently hanging on first use whenever the network path to openaipublic.blob.core.windows.net is unhealthy — DNS resolution stalls, SYN black-holes, captive portals, broken corporate proxies, mid-flight TCP resets. The user sees a wedged process with no traceback and no way to interrupt short of Ctrl-C/SIGKILL.

This is independent of #514 (TIKTOKEN_OFFLINE): that PR short-circuits before any network call when the user opts in to offline mode. This PR fixes the case where the network call does happen but the peer doesn't respond.

Fix

Pass an explicit timeout (default 60 s) to requests.get, configurable via the TIKTOKEN_HTTP_TIMEOUT environment variable:

try:
    timeout: float | None = float(os.environ.get(\"TIKTOKEN_HTTP_TIMEOUT\", \"60\"))
except ValueError:
    timeout = 60.0
resp = requests.get(blobpath, timeout=timeout)

Falls back to the default if the env var can't be parsed as a float, so a malformed value can't itself crash the tokenizer download.

Test plan

tests/test_load.py (new) covers three regression cases — default 60s, env override, unparseable env fallback — by patching requests via sys.modules so no real network traffic is generated.
Confirm existing tests still pass (CI).
Confirm encoding_for_model("gpt-4o") against a sinkhole IP (e.g. 127.0.0.2) now raises requests.exceptions.ReadTimeout after TIKTOKEN_HTTP_TIMEOUT=2 instead of hanging.

`read_file` calls `requests.get(blobpath)` without an explicit timeout when fetching tokenizer data over HTTP/HTTPS. Without a timeout, the request blocks indefinitely on DNS failures, SYN black-holes, TCP resets, or unresponsive proxies — silently hanging `encoding_for_model` on first use with no way to interrupt short of killing the process. Pass a default 60-second timeout, configurable via the `TIKTOKEN_HTTP_TIMEOUT` environment variable. Falls back to the default if the env var can't be parsed as a float, so a malformed value can't crash the tokenizer download. Add `tests/test_load.py` with three regression cases: default, env override, and unparseable env fallback. Each replaces `requests.get` via `sys.modules` so no real network traffic is made.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HTTP timeout to read_file network fetch#539

Add HTTP timeout to read_file network fetch#539
Spectual wants to merge 1 commit into
openai:mainfrom
Spectual:fix/load-add-http-timeout

Spectual commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Spectual commented May 9, 2026

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant