feat: add DataFrameLike parameter for cross-backend dataframe inputs#1144
Open
ghostiee-11 wants to merge 3 commits into
Open
feat: add DataFrameLike parameter for cross-backend dataframe inputs#1144ghostiee-11 wants to merge 3 commits into
ghostiee-11 wants to merge 3 commits into
Conversation
`param.DataFrame` is restricted to pandas. `DataFrameLike` accepts any object Narwhals recognises (pandas, Polars, PyArrow, cuDF, Modin) and passes it through unchanged, so existing pandas-only code is unaffected (`param.DataFrame` is not touched). * New `DataFrameLike(ClassSelector)` validating via `narwhals.from_native(eager_only=not allow_lazy, pass_through=False)`. Narwhals is an optional dependency, deferred like pandas is for `DataFrame`. * Same `rows` / `columns` / `ordered` slots as `DataFrame`, driven through the Narwhals wrapper so they work on every backend. Column names read via `collect_schema().names()` so lazy frames are not implicitly collected. * `allow_lazy=True` opts into lazy frames (Polars LazyFrame, Dask, DuckDB); row-count validation is skipped for lazy frames. * Backend-neutral `serialize` (list of records via Narwhals); `deserialize` reuses `DataFrame.deserialize` since JSON carries no backend information. * `_length_bounds_check` extracted to a module-level helper shared by `DataFrame` and `DataFrameLike` (behaviour-preserving; testpandas unchanged). * tests/testdataframelike.py covering pandas / Polars / PyArrow / lazy / serialization; narwhals + polars added to test-only dependencies.
* Raise a clear ImportError naming the install command when the optional narwhals package is missing, instead of a bare ModuleNotFoundError (declaration-time fail-fast, matching how DataFrame fails on missing pandas). * Document the serialization asymmetry (backend-neutral records out, pandas in) and that cuDF/Modin are Narwhals-supported but not run in CI (cuDF is GPU-only, Modin's pinned deps conflict with the test environment). * Annotate the inherited in-place ordered defaulting as deliberate DataFrame parity. * Add a skip-guarded Modin test and add narwhals + polars to the type-check environment so pyright validates the Narwhals API rather than skipping an unresolved import.
Remove cuDF/Modin name-drops from the docstring, error message and tests. They are reachable through Narwhals like any other backend but are not exercised here (no GPU; Modin's pinned deps conflict), so naming them as features overclaims. The validation path is described generically as "any Narwhals-supported backend" with pandas, Polars and PyArrow as the tested set. Drops the permanently-skipped Modin test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
param.DataFrameonly acceptspandas.DataFrame, so there is no way to declare a parameter that holds tabular data when the value might be Polars, PyArrow, or another backend. This adds a newDataFrameLikeparameter that validates anything the Narwhals protocol recognises and passes the native object through unchanged.param.DataFrameis deliberately left untouched, so existing pandas-only code keeps its guarantee. This is the separate-class direction discussed in #975; serialization backend-preservation is intentionally left as an open question there.Same
rows/columns/orderedslots asDataFrame(driven through Narwhals so they work on every backend), plusallow_lazy=Truefor PolarsLazyFrame/ Dask / DuckDB with no implicit collect. Narwhals is an optional dependency, deferred like pandas is forDataFrame, with a clear install message if missing.Before


After
Tested with pandas, Polars (eager + lazy), and PyArrow; full suite 1550 passed,
testpandas.pyunchanged. Validation goes entirely through Narwhals, so any other Narwhals-supported backend uses the identical code path.