Skip to content

fix: improve truncation-aware parse failure logging#754

Open
stepwise-ai-dev wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
stepwise-ai-dev:stepwise-ai-dev/fix/411-truncation-aware-parse-failure-logging
Open

fix: improve truncation-aware parse failure logging#754
stepwise-ai-dev wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
stepwise-ai-dev:stepwise-ai-dev/fix/411-truncation-aware-parse-failure-logging

Conversation

@stepwise-ai-dev

Copy link
Copy Markdown

Summary

Fixes #411 by surfacing an actionable message when a parse or recipe failure follows a model response that ended because of max_tokens.

Changes

  • Propagates truncation metadata through generation validation errors and record-drop warnings.
  • Normalizes Anthropic stop_reason into canonical completion choice finish_reason and prefers canonical finish reasons before raw response fallback.
  • Adds regression coverage for sync generation, async generation, Anthropic stop reasons, sync DatasetBuilder logging, and async scheduler row-drop logging.

Validation

  • uv run --group dev pytest packages/data-designer-engine/tests/engine/models/test_facade.py packages/data-designer-engine/tests/engine/models/test_model_errors.py packages/data-designer-engine/tests/engine/models/clients/test_anthropic.py packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py
  • make check-engine

Attention Areas

  • packages/data-designer-engine/src/data_designer/engine/models/facade.py - truncation detection now treats finish_reason=length and finish_reason=max_tokens as max-token truncation.
  • packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/anthropic_translation.py - Anthropic stop_reason is now preserved as choice finish_reason.

Fixes #411

Normalize Anthropic stop reasons into completion choices and prefer canonical finish_reason metadata when detecting max_tokens truncation. Add async scheduler coverage so dropped rows retain the actionable max_tokens guidance.
@stepwise-ai-dev stepwise-ai-dev requested a review from a team as a code owner June 16, 2026 21:28
@github-actions

Copy link
Copy Markdown
Contributor

Linked Issue Check

Issue #411 has not been triaged yet. A maintainer needs to review
the issue and add the triaged label before this PR can be merged.

You can continue working on the PR in the meantime. The check will
re-run automatically once the issue is triaged.

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR surfaces actionable truncation guidance when a parse or recipe failure is caused by a model response that hit max_tokens. It propagates a truncated_by_max_tokens flag from the detection point in facade.py through the error hierarchy to both the user-facing handle_llm_exceptions message and the per-record drop warning in dataset_builder.py.

  • Truncation detection (facade.py): _response_was_truncated_by_max_tokens checks the canonical choices[0].finish_reason first (\"length\" for OpenAI, \"max_tokens\" for Anthropic), then falls back to the raw response dict for clients that don't yet populate the canonical field.
  • Anthropic normalization (anthropic_translation.py): parse_anthropic_response now populates choices[0].finish_reason from stop_reason, so the canonical path handles Anthropic without touching the raw response.
  • Error propagation (errors.py, dataset_builder.py): Both GenerationValidationFailureError/ModelGenerationValidationFailureError gain a truncated_by_max_tokens field; handle_llm_exceptions and _format_worker_failure_warning each emit a distinct message when it is set.

Confidence Score: 5/5

Safe to merge — the change is additive and observability-only; it never alters control flow or retry decisions, only the message attached to an already-raised error.

The detection logic is correct for both OpenAI and Anthropic stop reasons, the canonical path is preferred over the raw fallback, and all error types involved are properly extended. The six-case parametrized test in test_facade.py covers every branch of _response_was_truncated_by_max_tokens, and the async scheduler and dataset builder tests confirm end-to-end log output. No existing behavior is changed when truncation is absent.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/models/facade.py Adds _response_was_truncated_by_max_tokens helper and propagates truncated_by_max_tokens to all _build_generation_validation_error callsites in both sync and async generate loops; logic is correct and well-guarded.
packages/data-designer-engine/src/data_designer/engine/models/errors.py Adds truncated_by_max_tokens field to both GenerationValidationFailureError and ModelGenerationValidationFailureError, and forks handle_llm_exceptions to emit a distinct truncation-specific message when the flag is set.
packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/anthropic_translation.py Populates choices[0].finish_reason from Anthropic stop_reason so the canonical truncation check in facade.py can detect max_tokens without falling back to the raw response dict.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py Appends a truncation-specific hint to the per-record drop warning when exc.truncated_by_max_tokens is truthy; uses getattr to stay safe against exception types that don't carry the attribute.
packages/data-designer-engine/tests/engine/models/test_facade.py Adds parametrized tests for all six truncation-detection paths for both sync and async generate.
packages/data-designer-engine/tests/engine/models/clients/test_anthropic.py Adds assertions that choices[0].finish_reason is populated for text, max_tokens, and tool_use stop reasons.
packages/data-designer-engine/tests/engine/models/test_model_errors.py Adds regression for truncation-specific error message path and extends existing test to assert truncated_by_max_tokens=False on the non-truncated path.
packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py New test drives _worker_error_callback with a truncated-parse-failure exception and verifies the warning text contains both the truncation cause and the max_tokens remediation advice.
packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py Adds MockTruncatedParseFailureGenerator and an async scheduler test that confirms the row is dropped and the truncation guidance appears in the warning log.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Model response received] --> B[ParserException raised]
    B --> C[_response_was_truncated_by_max_tokens]
    C --> D{canonical choices finish_reason is length or max_tokens?}
    D -- Yes --> E[truncated = True]
    D -- No --> F{raw response available?}
    F -- No --> G[truncated = False]
    F -- Yes --> H{raw choices finish_reason is length?}
    H -- Yes --> E
    H -- No --> I{raw stop_reason is max_tokens?}
    I -- Yes --> E
    I -- No --> G
    E --> J[_build_generation_validation_error with truncated=True]
    G --> K[_build_generation_validation_error with truncated=False]
    J --> L[handle_llm_exceptions emits truncation-specific message]
    K --> M[handle_llm_exceptions emits generic message]
    L --> N[ModelGenerationValidationFailureError truncated=True]
    M --> O[ModelGenerationValidationFailureError truncated=False]
    N --> P[dataset_builder warning includes max_tokens hint]
    O --> Q[dataset_builder warning generic]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Model response received] --> B[ParserException raised]
    B --> C[_response_was_truncated_by_max_tokens]
    C --> D{canonical choices finish_reason is length or max_tokens?}
    D -- Yes --> E[truncated = True]
    D -- No --> F{raw response available?}
    F -- No --> G[truncated = False]
    F -- Yes --> H{raw choices finish_reason is length?}
    H -- Yes --> E
    H -- No --> I{raw stop_reason is max_tokens?}
    I -- Yes --> E
    I -- No --> G
    E --> J[_build_generation_validation_error with truncated=True]
    G --> K[_build_generation_validation_error with truncated=False]
    J --> L[handle_llm_exceptions emits truncation-specific message]
    K --> M[handle_llm_exceptions emits generic message]
    L --> N[ModelGenerationValidationFailureError truncated=True]
    M --> O[ModelGenerationValidationFailureError truncated=False]
    N --> P[dataset_builder warning includes max_tokens hint]
    O --> Q[dataset_builder warning generic]
Loading

Reviews (1): Last reviewed commit: "fix: use finish reasons for truncation g..." | Re-trigger Greptile

@stepwise-ai-dev

Copy link
Copy Markdown
Author

Thanks for the automated review. Greptile, DCO, semantic title, and the agentic CI gate are now passing. The only failing check is the linked-issue gate because #411 does not yet have the maintainer-added triaged label; happy to adjust the PR if maintainers want any changes before/after triage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve record failure logging for max_tokens-truncated parse/recipe failures

1 participant