Summary
Implement ArkoudaCategoricalArray.__setitem__ to support
pandas-compatible item assignment into Arkouda-backed categorical
ExtensionArrays.
This is required for common pandas workflows such as:
Series.loc[...] = ... / Series.iloc[...] = ...
- boolean mask assignment
where/mask
fillna and other in-place manager paths
- categorical value replacement without dtype loss
Currently, assignment into Arkouda categorical arrays is missing or
inconsistent, leading to TypeError/NotImplementedError or pandas
fallback behavior (often converting to object/NumPy).
Background / Why
pandas Categorical supports item assignment with strict rules:
- Assigned values must be existing categories or missing
- New categories are not implicitly added (unless user explicitly
adds them via add_categories or similar higher-level API)
- Missing values are supported and propagate through codes/mask
- Assignment must preserve dtype
(CategoricalDtype(categories=..., ordered=...))
For Arkouda-backed categoricals, we want identical semantics while
keeping operations server-side where possible.
Requirements / Expected pandas Semantics
Given categories ["a", "b"]:
- Assign existing category:
- Assign missing:
cat[0] = None / pd.NA is allowed and marks entry missing
- Assign value not in categories:
cat[0] = "c" should raise (typically
TypeError/ValueError depending on path)
- pandas message often indicates: "Cannot setitem on a Categorical
with a new category..."
- Assignment via indexers should work:
- int, slice, boolean mask, integer array indexer
- Broadcasting rules:
- scalar value broadcasts to all targeted positions
- array-like values must match number of targeted positions
Scope
In Scope
- Implement
ArkoudaCategoricalArray.__setitem__(key, value)
- Support keys:
- int position
- slice
- boolean mask (same length)
- integer indexer (array-like positions)
- Support values:
- scalar category label
- scalar missing (
None, pd.NA, possibly np.nan)
- array-like of labels/missing matching target selection length
- another
ArkoudaCategoricalArray (assignment by position)
- Enforce "no new categories" rule
- Preserve:
- categories
- ordered flag
- dtype and internal representation (codes + categories + missing
marker/mask)
- Add unit tests
Out of Scope
- Adding categories automatically during setitem
- Implementing
add_categories / remove_categories (if not already
present)
- 2D assignment (categorical EA is 1D)
- Alignment by Index labels (handled by pandas, not EA)
Summary
Implement
ArkoudaCategoricalArray.__setitem__to supportpandas-compatible item assignment into Arkouda-backed categorical
ExtensionArrays.
This is required for common pandas workflows such as:
Series.loc[...] = .../Series.iloc[...] = ...where/maskfillnaand other in-place manager pathsCurrently, assignment into Arkouda categorical arrays is missing or
inconsistent, leading to
TypeError/NotImplementedErroror pandasfallback behavior (often converting to object/NumPy).
Background / Why
pandas
Categoricalsupports item assignment with strict rules:adds them via
add_categoriesor similar higher-level API)(
CategoricalDtype(categories=..., ordered=...))For Arkouda-backed categoricals, we want identical semantics while
keeping operations server-side where possible.
Requirements / Expected pandas Semantics
Given categories
["a", "b"]:cat[0] = "b"is allowedcat[0] = None/pd.NAis allowed and marks entry missingcat[0] = "c"should raise (typicallyTypeError/ValueErrordepending on path)with a new category..."
Scope
In Scope
ArkoudaCategoricalArray.__setitem__(key, value)None,pd.NA, possiblynp.nan)ArkoudaCategoricalArray(assignment by position)marker/mask)
Out of Scope
add_categories/remove_categories(if not alreadypresent)