Skip to content

fix: drop invalid expression rows#757

Open
nabinchha wants to merge 3 commits into
mainfrom
codex/issue-749-expression-row-drops
Open

fix: drop invalid expression rows#757
nabinchha wants to merge 3 commits into
mainfrom
codex/issue-749-expression-row-drops

Conversation

@nabinchha

Copy link
Copy Markdown
Contributor

📋 Summary

Expression columns currently fail the whole column when a single row renders to empty text or hits a row-specific Jinja/cast error. This PR makes those failures behave like other row-level generation failures: invalid rows are dropped with structured warning counts, while fully broken expressions still fail loudly.

🔗 Related Issue

Fixes #749

🔄 Changes

  • Drop only the affected row when an expression render is empty/whitespace, hits a row-specific template error, or fails dtype conversion.
  • Preserve input row indexes from ExpressionColumnGenerator so shrunken expression batches can be merged back safely.
  • Update the sync builder to allow expression row drops in both full-column and skip-aware merge paths.
  • Update the async scheduler to map expression results by row-group index, drop missing result rows, and treat all-dropped expression batches as fatal instead of salvaging the whole row group.
  • Add regression coverage for generator-level row drops, sync builder shrink behavior, skip-aware sync shrink behavior, partial async row-group shrink, and async all-dropped failure.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • async_scheduler.py — this is the core async merge/error-boundary change, including the distinction between partial row drops and all-dropped expression failures.
  • dataset_builder.py — sync full-column and skip-aware paths now allow expression generators to shrink batches while preserving skip metadata.

🧪 Testing

  • make test passes (not run; focused engine checks were run instead)
  • Unit tests added/updated
  • E2E tests added/updated (N/A — no E2E surface changed)
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/column_generators/generators/test_expression.py packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py::test_expression_column_row_drops_shrink_sync_batch packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py::test_expression_column_row_drops_shrink_sync_skip_aware_batch packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py::test_expression_row_drops_shrink_async_row_group packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py::test_expression_all_dropped_async_row_group_fails_loudly -q
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py -q
  • .venv/bin/ruff check <changed files>
  • .venv/bin/ruff format --check <changed files>

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO) — existing commits do not include Signed-off-by trailers
  • Architecture docs updated (N/A — existing docs already describe dropped-row outcomes; this PR does not change the public config surface)

Handle per-row expression render and cast failures by dropping only affected rows, preserving row identity in sync and async full-column paths, and failing when no valid rows remain.

Fixes #749
@nabinchha nabinchha requested a review from a team as a code owner June 17, 2026 01:15
@nabinchha nabinchha deployed to agentic-ci June 17, 2026 01:15 — with GitHub Actions Active
@github-actions

Copy link
Copy Markdown
Contributor

PR #757 Review — fix: drop invalid expression rows

Summary

The PR converts row-specific expression failures (empty Jinja renders, row-level template exceptions, dtype-cast errors) from full-column failures into per-row drops with structured warning counts, while preserving "fail-loud" behavior when all rows in a batch drop. It threads the resulting variable-size DataFrames through three places that previously assumed rigid 1:1 input/output sizing:

  1. ExpressionColumnGenerator.generate — collects per-row drop reasons in a Counter, preserves input row indexes on the returned DataFrame, and raises UserTemplateError only when every row drops.
  2. Sync DatasetBuilder_run_full_column_generator_without_skip and _merge_skipped_and_generated now allow expression generators to shrink the buffer, with skip-metadata round-trip honoring allow_resize=True.
  3. Async AsyncTaskScheduler._run_batch — maps batch results by row-group index instead of positional offset, drops rows whose expression results are missing, and elevates UserTemplateError from worker tasks to a fatal scheduler error so all-dropped batches abort instead of silently zeroing the row group.

Test coverage is solid: generator-level drops (empty/template-error/cast-error/all-dropped), sync shrink, sync skip-aware shrink, async partial drop, and async all-dropped fatal.

Findings

Correctness

  • expression.py:55-65except Exception is too broad and degrades diagnostics for systemic config bugs. The bare catch will swallow UserTemplateUnsupportedFiltersError, mistyped variable references, and similar template-level (not row-level) errors. Because these usually fail every row identically, the user ends up seeing the generic Expression column 'X' produced no valid rows. instead of the actionable upstream message (e.g. "No filter named 'foo'"). Consider narrowing the catch to row-level shapes (UndefinedError, TypeError, ZeroDivisionError, …) and letting other UserTemplateError subclasses propagate, or preserving the original exception chain when the all-dropped error is raised (raise UserTemplateError(...) from last_exc) so the root cause is still visible.

  • expression.py:42-43 — duplicate dtype validation. ExpressionColumnConfig.dtype is already a Literal["int","float","str","bool"] enforced by Pydantic, and _cast_type's else branch already raises ValueError on the impossible case. Adding a third runtime guard (with its own _VALID_DTYPES constant) just creates a third source of truth that has to stay in sync with the Literal. Recommend dropping the new check and the constant.

  • expression.py:88-92_is_empty_rendered_expression None branch is dead code. safe_render returns str; it raises rather than returning None. The value is None branch can never fire today.

  • async_scheduler.py:1702-1713 — fatal-error log line uses task.row_index which is None for batch tasks. The new logger.error("Fatal expression failure on %s[rg=%s, row=%s]", task.column, task.row_group, task.row_index, ...) will always log row=None for the only path that reaches it (expression task_type == "batch"). Either drop the row= field or label it as task=batch.

  • async_scheduler.py:1797-1804 — duplicated fatal-error detection in two layers. _run_generator_call now bypasses its DatasetGenerationError wrapping for UserTemplateError, and _execute_task_inner_impl re-checks _is_fatal_expression_template_error on the same exception. The bypass is needed (otherwise the wrapped exception would fail the isinstance(UserTemplateError) check), but the symmetry would be clearer with a comment explaining why both checks exist, or by introducing a sentinel (e.g. _FatalExpressionDrop) wrapper that survives the wrap.

Code quality / reuse

  • Validation is duplicated between sync (_merge_row_dropped_generated_records) and async (_require_expression_row_drop_result). Both do: dedup-check, "indexes outside active set" check, length cap. Worth extracting a single helper (e.g. validate_row_drop_result(result_df, active_indices) -> None) to keep the two paths from drifting. The error messages also differ slightly — "Expression generator returned…" (sync) vs "Batch generator for column 'X' returned…" (async) — unifying them helps users grep logs.

  • retained_indexes: list[object] — overly broad type. A pandas index value is Hashable and in this codebase's actual use is int. list[Hashable] (or list[int]) is closer to ground truth and matches what _require_expression_row_drop_result later coerces with set(result_indexes).

  • _log_row_drops builds a sorted Counter breakdown manually. Counter.most_common() already gives a deterministic ordering by count; if the intent is alphabetical-by-name, a comment would help future readers, since the natural read of Counter is frequency-ordered.

Tests

  • Good: Coverage spans generator unit tests, sync builder paths (with and without skip), and both async outcomes (partial shrink + all-dropped fatal). The async tests exercise RowGroupBufferManager and CompletionTracker interactions, which is the right place to catch index-mapping bugs.

  • Gap: No test exercises a downstream column that depends on the expression result after rows shrink. With replace_buffer(allow_resize=True), the next column's generator now runs against a smaller batch — that's the intended behavior, but a regression test confirming expression → llm-text → expression chains stay aligned (and that skip-metadata round-trips through expression drops) would harden against future merge-back regressions.

  • Gap: test_generate_drops_row_specific_template_errors uses {{ 1 / denominator }} with denominator=0. Worth adding a case that exercises the broad except Exception with a non-arithmetic shape — e.g. {{ items[0] }} where items is sometimes [] — so the test suite would catch a future narrowing of the catch clause if someone tightens it without thinking through the failure surface.

Observability

  • Drop logs include a structured breakdown — good. Consider also emitting a counter through whatever metrics surface the engine already has (self._reporter?) so dashboards can track expression-drop rates without log scraping. Not blocking.

Backward compatibility / public surface

  • The behavior change is the whole point of the fix, but it's worth flagging in the changelog: prior to this PR, a single empty render aborted the column; after, it warns and drops the row. Consumers asserting len(output) == len(input) for expression columns will now see fewer rows. The PR description says "Architecture docs already describe dropped-row outcomes" — verify the public-facing docs (Fern site under fern/) actually reflect this for expression columns specifically; the existing dropped-row docs may have only described LLM/sampler paths.

Verdict

Approve with minor changes. The architectural shift (per-row drops with index-preserving merge) is sound, the async fatal-error path is correctly wired, and tests cover the important scenarios. The asks above (narrowing except Exception, removing the duplicate dtype validation, sharing the row-drop validator between sync and async, fixing the row=None log) are quality cleanups, not blockers. The downstream-chain test would be the most valuable add before merge.

@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR changes ExpressionColumnGenerator from a fail-the-whole-column model to a drop-the-bad-row model, matching the behavior of other row-level generation failures. Both the sync builder and async scheduler are updated to accept shrunken expression batches and propagate the dropped-row accounting (drop markers, buffer updates, completion tracking) correctly.

  • expression.py: Per-row loop now catches EmptyTemplateRenderError, generic template exceptions, and cast errors individually, preserves original DataFrame index for retained rows, and raises UserTemplateError (fatal) only when every row is dropped.
  • async_scheduler.py: _run_batch now carries active_row_indices alongside the batch DataFrame, routes expression results through a new _require_expression_row_drop_result validator, maps results back by row index via _batch_result_by_row_index, calls _drop_row for missing rows, and treats UserTemplateError from expression tasks as a fatal scheduler error.
  • dataset_builder.py: Both the direct full-column path and the skip-aware merge path now pass allow_resize=True for expression generators; a new _merge_row_dropped_generated_records helper handles index-based reconciliation with skip provenance, which is correctly preserved by the existing ID-based restore_skip_metadata mechanism.

Confidence Score: 4/5

The core logic is correct: row-index–based merge is sound, the ID-based skip-metadata restoration already tolerates dropped rows, and the fatal all-dropped boundary is correctly wired through both the async and sync paths.

The async scheduler change is the most complex part of this PR — it adds a new result-by-row-index dispatch path in _run_batch alongside the existing positional path, with a fatal-error shortcut through _run_generator_call and _on_worker_exception. The logic is correct and is covered by integration tests, but the dual dispatch in a concurrent scheduler is worth a second look before merge.

async_scheduler.py — the new expression row-drop path in _run_batch and the UserTemplateError fatal-error wiring deserve the most scrutiny.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/expression.py Adds per-row error handling to ExpressionColumnGenerator: catches EmptyTemplateRenderError, generic template errors, and cast errors individually, preserves original DataFrame index for retained rows, and raises UserTemplateError when all rows are dropped.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py Adds expression-aware row-drop handling in _run_batch: tracks active_row_indices, routes expression results through _require_expression_row_drop_result, maps results back by row index, drops unmatched rows via _drop_row, and propagates UserTemplateError as a fatal error.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py Sync builder updated to allow expression generators to shrink batches: adds _generator_supports_row_drops(), passes allow_resize=True for expression generators in both full-column and skip-aware merge paths, and adds _merge_row_dropped_generated_records() for index-based record reconciliation.
packages/data-designer-engine/tests/engine/column_generators/generators/test_expression.py Adds four regression tests covering empty-render drops, template-error drops, cast-error drops, and the all-dropped fatal case; existing resource_provider mock updated to include run_config.
packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py Adds two async integration tests: partial row-group shrink (verifies correct row-index tracking and buffer drop) and all-dropped fatal failure (verifies DatasetGenerationError propagation and empty output).
packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py Adds two sync builder tests: full-column batch shrink and skip-aware batch shrink, covering both code paths in dataset_builder.py.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant S as Scheduler / Builder
    participant E as ExpressionColumnGenerator
    participant B as BufferManager

    S->>E: generate(batch_df with active_row_indices as index)
    loop per row
        E->>E: render_template(record)
        alt EmptyTemplateRenderError or empty string
            E->>E: "drop_counts[EMPTY] += 1"
        else generic Exception
            E->>E: "drop_counts[TEMPLATE_ERROR] += 1"
        else cast fails
            E->>E: "drop_counts[TYPE_CAST] += 1"
        else success
            E->>E: records.append + retained_indexes.append
        end
    end
    alt all rows dropped
        E-->>S: raise UserTemplateError (fatal)
        S->>S: "_fatal_worker_error = exc → DatasetGenerationError"
    else partial drops
        E-->>S: "result_df (index = retained row indices)"
        S->>S: _require_expression_row_drop_result(validate subset)
        S->>S: _batch_result_by_row_index → dict[ri → record]
        loop ri in rg_size
            alt ri missing from result
                S->>B: "_drop_row(rg, ri, exclude_columns={col})"
            else ri present
                S->>B: update_cell(rg, ri, col, value)
            end
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant S as Scheduler / Builder
    participant E as ExpressionColumnGenerator
    participant B as BufferManager

    S->>E: generate(batch_df with active_row_indices as index)
    loop per row
        E->>E: render_template(record)
        alt EmptyTemplateRenderError or empty string
            E->>E: "drop_counts[EMPTY] += 1"
        else generic Exception
            E->>E: "drop_counts[TEMPLATE_ERROR] += 1"
        else cast fails
            E->>E: "drop_counts[TYPE_CAST] += 1"
        else success
            E->>E: records.append + retained_indexes.append
        end
    end
    alt all rows dropped
        E-->>S: raise UserTemplateError (fatal)
        S->>S: "_fatal_worker_error = exc → DatasetGenerationError"
    else partial drops
        E-->>S: "result_df (index = retained row indices)"
        S->>S: _require_expression_row_drop_result(validate subset)
        S->>S: _batch_result_by_row_index → dict[ri → record]
        loop ri in rg_size
            alt ri missing from result
                S->>B: "_drop_row(rg, ri, exclude_columns={col})"
            else ri present
                S->>B: update_cell(rg, ri, col, value)
            end
        end
    end
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py:2063-2070
**Third guard in `_require_expression_row_drop_result` is unreachable**

Given that the two preceding checks already pass — (1) no duplicate result indexes, and (2) every result index is a member of `active_index_set` — the result set is a subset of `active_row_indices` with no duplicates. Because `active_row_indices` is itself duplicate-free (built from `range(rg_size)` minus pre-dropped rows), `len(active_index_set) == len(active_row_indices)`, so `len(result_df) <= len(active_row_indices)` is guaranteed and the third `if` can never fire. This is dead code, not a runtime bug, but it may create a false sense of coverage for this guard.

Reviews (1): Last reviewed commit: "Merge branch 'main' into codex/issue-749..." | Re-trigger Greptile

Comment on lines +2063 to 2070
result_records = result_df.to_dict(orient="records")
if supports_row_drops:
return dict(zip(result_df.index.to_list(), result_records, strict=True))
return dict(zip(active_row_indices, result_records, strict=True))

def _get_rg_size(self, row_group: int) -> int:
try:
return self._row_groups.row_group_size(row_group)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Third guard in _require_expression_row_drop_result is unreachable

Given that the two preceding checks already pass — (1) no duplicate result indexes, and (2) every result index is a member of active_index_set — the result set is a subset of active_row_indices with no duplicates. Because active_row_indices is itself duplicate-free (built from range(rg_size) minus pre-dropped rows), len(active_index_set) == len(active_row_indices), so len(result_df) <= len(active_row_indices) is guaranteed and the third if can never fire. This is dead code, not a runtime bug, but it may create a false sense of coverage for this guard.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py
Line: 2063-2070

Comment:
**Third guard in `_require_expression_row_drop_result` is unreachable**

Given that the two preceding checks already pass — (1) no duplicate result indexes, and (2) every result index is a member of `active_index_set` — the result set is a subset of `active_row_indices` with no duplicates. Because `active_row_indices` is itself duplicate-free (built from `range(rg_size)` minus pre-dropped rows), `len(active_index_set) == len(active_row_indices)`, so `len(result_df) <= len(active_row_indices)` is guaranteed and the third `if` can never fire. This is dead code, not a runtime bug, but it may create a false sense of coverage for this guard.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle row-level failures in Jinja expression columns by dropping invalid rows

1 participant