Conversation
There was a problem hiding this comment.
Pull request overview
Updates Delta Lake overwrite predicate generation to support partition filters where a partition key has multiple values (i.e., an IN (...) predicate), preventing failures when _filters() returns list-valued filter tuples.
Changes:
- Build overwrite predicates incrementally instead of
" ".join(tuple)so list-valued filters can be rendered ascol in (v1, v2, ...). - Introduce special handling for list-valued filter tuples when assembling the predicate string.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if isinstance(tuple_value[-1], list): | ||
| in_list = ", ".join(tuple_value[-1]) | ||
| predicate_list.append(" ".join([tuple_value[0], tuple_value[1], f"({in_list})"])) | ||
| else: | ||
| predicate_list.append(" ".join(tuple_value)) |
There was a problem hiding this comment.
Both branches build the predicate using str joins (, ".join(tuple_value[-1]) and " ".join(tuple_value)), which will raise TypeError if partition values aren’t already strings (e.g., int partitions). Since _filters() returns Any for the value, this should defensively stringify/format values (and ideally handle quoting/escaping for strings) instead of assuming pre-quoted string literals.
| for tuple_value in filter: | ||
| if isinstance(tuple_value[-1], list): | ||
| in_list = ", ".join(tuple_value[-1]) | ||
| predicate_list.append(" ".join([tuple_value[0], tuple_value[1], f"({in_list})"])) |
There was a problem hiding this comment.
If a partition key is provided with an empty list (len==0), _filters() currently emits an in filter and this code will produce col in (), which is typically an invalid Delta predicate. Consider filtering out empty lists in _filters() (skip the key) or raising a clear error before calling write_deltalake.
| predicate_list: List[str] = [] | ||
| for tuple_value in filter: | ||
| if isinstance(tuple_value[-1], list): | ||
| in_list = ", ".join(tuple_value[-1]) | ||
| predicate_list.append(" ".join([tuple_value[0], tuple_value[1], f"({in_list})"])) | ||
| else: | ||
| predicate_list.append(" ".join(tuple_value)) | ||
| predicate = operator.join(predicate_list) |
There was a problem hiding this comment.
This change adds new behavior for list-valued predicates (in (...)) but there’s no unit test coverage verifying overwrite(..., partitions={key: [v1, v2]}) actually deletes/overwrites only the intended partitions. Adding a test for the multi-value IN case would help prevent regressions (especially around literal formatting/quoting).
No description provided.