Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,28 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project uses **CalVer `YY.M.PP`** (PEP 440 may normalise patch numbers
for the Python wheel — e.g. `26.06.00` → `26.6.0`).

## [26.6.11] - 2026-06-15

### Fixed

- **A malformed persisted extraction no longer dumps a raw traceback in the
worker log.** `ExtractionWorker._process` reconstructed the typed request
(`_build_request` → pydantic `model_validate`) *outside* its `try/except`, so an
invalid stored `schema_json`/`options_json` raised a `ValidationError` that
escaped to the poll loop's `logger.exception(...)` and printed a full stack
trace. The reconstruction now runs inside the guarded block, where
`_is_permanent()` classifies it as terminal and the job is marked
`permanent_error` and logged cleanly — no traceback.

### Changed

- **Upgraded pyfly to `v26.06.104`** for framework-level clean error reporting:
expected client/domain faults (validation, business-rule, auth — the 4xx
family) are now logged at WARNING without a stack trace across the CQRS handlers
and the web request log, and the pyfly CLI prints a clean `Error: ...` line
instead of a traceback (`--debug` / `PYFLY_DEBUG` restores the full trace).
Dependency pin and floor moved to `v26.06.104` / `>=26.6.104`.

## [26.6.10] - 2026-06-15

### Changed
Expand Down
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name = "flydocs"
# CalVer YY.MM.PP -- bumped per release. Note that PEP 440 normalises
# ``26.05.01`` -> ``26.5.1`` in the built wheel filename.
version = "26.6.10"
version = "26.6.11"
description = "Pure-multimodal Intelligent Document Processing service: structured fields + bounding boxes, validation, authenticity checks, LLM judge, and a business-rule engine. Sync + queue-backed async APIs over fireflyframework-pyfly and -agentic. Part of Firefly OperationOS, platform-agnostic by design."
readme = "README.md"
requires-python = ">=3.13"
Expand All @@ -19,7 +19,7 @@ dependencies = [
# so a fresh ``uv sync`` is enough to boot the full stack. The ``web``
# extra declares starlette + uvicorn, which the worker health server
# imports directly; the floor carries ``pyfly.actuator.install_health_indicators``.
"pyfly[fastapi,web,observability,security,data-relational,postgresql,eda,redis,client,scheduling,cli]>=26.6.103",
"pyfly[fastapi,web,observability,security,data-relational,postgresql,eda,redis,client,scheduling,cli]>=26.6.104",

# GenAI metaframework -- FireflyAgent with multimodal content (BinaryContent/ImageUrl)
# over pydantic-ai. Pulls in the OpenAI / Anthropic / Bedrock providers via pydantic-ai-slim.
Expand Down Expand Up @@ -131,7 +131,7 @@ override-dependencies = [
# (vestigial) ./vendor clone + Dockerfile BuildKit context for pyfly are now
# no-ops — the path-rewrite sed no longer matches a git source — exactly as for
# agentic; they can be removed in a later cleanup.
pyfly = { git = "https://github.com/fireflyframework/fireflyframework-pyfly.git", tag = "v26.06.103" }
pyfly = { git = "https://github.com/fireflyframework/fireflyframework-pyfly.git", tag = "v26.06.104" }
fireflyframework-agentic = { git = "https://github.com/fireflyframework/fireflyframework-agentic.git", tag = "v26.05.30" }

[tool.hatch.build.targets.wheel]
Expand Down
2 changes: 1 addition & 1 deletion src/flydocs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@
`PromptRegistry`).
"""

__version__ = "26.6.10"
__version__ = "26.6.11"
49 changes: 27 additions & 22 deletions src/flydocs/core/services/workers/job_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,30 +241,35 @@ async def _process(self, extraction_id: str) -> None:
extraction_id=row.id,
attempt=attempts,
)
request = self._build_request(row)
# Capture the original intent BEFORE we mutate the request: we
# need to know whether the caller wanted bbox refinement so we
# can publish the post-processing event afterwards, even if we
# skip the inline node below.
wants_bbox_refine = bool(getattr(request.options.stages, "bbox_refine", False))
if wants_bbox_refine:
# Architectural decision: on the async path, skip the inline
# bbox_refine node entirely. The dedicated BboxRefineWorker
# picks up the post-processing event we publish below and
# grounds bboxes there. Running both wastes minutes of CPU
# and LLM tokens on duplicate work — and when the inline
# step times out (which it does on multi-PDF bundles) the
# pipeline framework marks the node as failed, which is
# misleading because the out-of-band path recovers
# transparently. The :class:`BboxRefiner` is idempotent
# (already-grounded fields are skipped on re-run), so even
# if both paths execute the work won't double up — but
# bypassing inline saves the latency outright.
stages_skipped = request.options.stages.model_copy(update={"bbox_refine": False})
options_skipped = request.options.model_copy(update={"stages": stages_skipped})
request = request.model_copy(update={"options": options_skipped})
started = time.monotonic()
try:
# Reconstruct the typed request from the persisted row INSIDE the
# try: a malformed schema/options payload makes pydantic raise a
# ValidationError, which _is_permanent() treats as a terminal
# failure (marked permanent_error) — instead of escaping to the
# poll loop's logger.exception() and dumping a raw traceback.
request = self._build_request(row)
# Capture the original intent BEFORE we mutate the request: we
# need to know whether the caller wanted bbox refinement so we
# can publish the post-processing event afterwards, even if we
# skip the inline node below.
wants_bbox_refine = bool(getattr(request.options.stages, "bbox_refine", False))
if wants_bbox_refine:
# Architectural decision: on the async path, skip the inline
# bbox_refine node entirely. The dedicated BboxRefineWorker
# picks up the post-processing event we publish below and
# grounds bboxes there. Running both wastes minutes of CPU
# and LLM tokens on duplicate work — and when the inline
# step times out (which it does on multi-PDF bundles) the
# pipeline framework marks the node as failed, which is
# misleading because the out-of-band path recovers
# transparently. The :class:`BboxRefiner` is idempotent
# (already-grounded fields are skipped on re-run), so even
# if both paths execute the work won't double up — but
# bypassing inline saves the latency outright.
stages_skipped = request.options.stages.model_copy(update={"bbox_refine": False})
options_skipped = request.options.model_copy(update={"stages": stages_skipped})
request = request.model_copy(update={"options": options_skipped})
result = await asyncio.wait_for(
self._orchestrator.execute(request, extraction_id=row.id),
timeout=self._settings.async_timeout_s,
Expand Down
8 changes: 4 additions & 4 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading