Commit faadf8d
[SPARK-55242][PYSPARK] Handle np.ndarray elements in object-dtype columns when converting from pandas
When a pandas DataFrame contains list-valued columns (e.g. a column
created via `[[e] for e in ...]`), pandas 3 stores each list element
internally as a `np.ndarray` object rather than a plain Python list.
The existing `DataTypeOps.prepare()` method calls:
col.replace({np.nan: None})
on the pandas Series before passing it to Spark's `createDataFrame`.
When the Series has dtype "object" and its elements are `np.ndarray`
objects, pandas 3 raises:
ValueError: The truth value of an array is ambiguous.
Use a.any() or a.all()
because numpy arrays cannot be compared with `==` in the way that
`replace` needs.
Fix: detect object-dtype columns whose non-null first element is a
`np.ndarray` and convert each such element to a plain Python list via
`.tolist()` before performing the NaN-to-None substitution. This also
ensures PyArrow correctly infers the column type as `ArrayType` for the
resulting Spark schema.
### Does this PR introduce _any_ user-facing change?
No - this is a regression fix. Previously `ps.from_pandas(pdf)` with a
list-valued column raised an error; after the fix it succeeds and the
data round-trips correctly.
### How was this patch tested?
Added `test_from_pandas_with_np_array_elements` in
`pyspark/pandas/tests/data_type_ops/test_complex_ops.py`, which
reproduces the exact scenario reported in SPARK-55242.
Closes #SPARK-552421 parent 0ba9a2a commit faadf8d
2 files changed
Lines changed: 32 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
548 | 548 | | |
549 | 549 | | |
550 | 550 | | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
551 | 562 | | |
552 | 563 | | |
553 | 564 | | |
| |||
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
247 | 248 | | |
248 | 249 | | |
249 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
250 | 271 | | |
251 | 272 | | |
252 | 273 | | |
| |||
0 commit comments