Summary
Implement NumPy ufunc interoperability for the Arkouda pandas ExtensionArray by adding a correct, well-scoped __array_ufunc__ implementation. This will allow common ufuncs (e.g., np.add, np.subtract, np.negative, np.logical_and, comparisons, etc.) to operate on Arkouda-backed Series/arrays without silently materializing to NumPy, while preserving pandas semantics where required.
Background / Motivation
Today, many NumPy ufunc operations on Arkouda-backed pandas objects either:
- fall back to object/NumPy materialization (breaking scalability), or
- error in inconsistent ways, or
- route through pandas that expects
__array_ufunc__ and __array_priority__ behavior for ExtensionArrays.
A minimal-but-correct __array_ufunc__ enables:
- predictable behavior for arithmetic and elementwise operations,
- better pandas compatibility (pandas frequently triggers ufunc paths),
- clear errors for unsupported dtypes (e.g.,
Strings, Categorical) or unsupported ufuncs/methods.
Goals
- Add
__array_ufunc__ to the Arkouda ExtensionArray implementation.
- Support elementwise ufuncs for numeric and boolean dtypes where there is a reasonable Arkouda mapping.
- Handle
method="__call__" and method="reduce" (as appropriate) with clear scoping.
- Respect pandas expectations:
- return an
ExtensionArray (or Series via pandas) when appropriate,
- propagate
np.nan / missing values correctly (where applicable),
- preserve dtype where possible.
- Avoid accidental conversion to NumPy unless explicitly requested (e.g., via
out being a NumPy array, or ufunc not supported).
Non-goals (for this ticket)
- Full coverage of every NumPy ufunc and method (
accumulate, reduceat, outer, etc.).
- Supporting ufuncs for Arkouda
Strings and Categorical unless there is a clear, existing Arkouda primitive (should raise a helpful TypeError for now).
- Implementing NumPy array protocol conversions beyond what is needed for ufunc interoperability.
Proposed Behavior
Supported inputs
self is the Arkouda ExtensionArray.
- Additional inputs may include:
- scalar Python numbers/bools,
- NumPy scalars,
- other Arkouda
ExtensionArray instances,
- pandas arrays/Series that wrap Arkouda arrays (unwrap as needed).
Dispatch rules
- Reject unsupported
method values with NotImplemented (or TypeError if pandas expects it), except for:
__call__ (required)
reduce (optional, only for a small safe subset such as np.add.reduce, np.logical_or.reduce if Arkouda equivalents exist)
- If any input is a higher-priority type that should handle the ufunc, return
NotImplemented.
- Map ufuncs to Arkouda server-side ops:
- Unary:
negative, absolute, invert (for bool/int), etc.
- Binary:
add, subtract, multiply, true_divide, floor_divide, power (if supported), comparisons (equal, not_equal, less, greater, etc.), logical ops for bool.
- If
out is provided:
- If
out contains Arkouda ExtensionArrays: write into those (if we support it), else reject with a clear error.
- If
out contains NumPy arrays: either materialize (explicit) or raise (preferred) — pick one and document it.
- Return type:
- For elementwise ops: return a new Arkouda
ExtensionArray with the result.
- For
reduce: return a scalar (Python/NumPy scalar) or a 0-dim equivalent consistent with pandas expectations.
Error messages
- For unsupported dtypes (Strings/Categorical): raise
TypeError like:
"NumPy ufunc '<name>' is not supported for Arkouda dtype '<dtype>'"
- For unsupported ufuncs: raise
NotImplementedError or return NotImplemented depending on pandas expectations; include a message guiding users to convert explicitly if they really want NumPy.
Implementation Notes
- Location: Arkouda pandas ExtensionArray class
- Consider implementing a small internal dispatcher:
_UFUNC_TABLE: dict[np.ufunc, callable] or mapping by ufunc.__name__.
- Centralize dtype checks and missing-value handling.
- Ensure correct behavior with:
__array_priority__ (set high enough to win dispatch vs NumPy when appropriate),
__array__ (if implemented) does not accidentally trigger conversions in the ufunc path.
- Make sure
__array_ufunc__ does not break Series ops that pandas already routes through its own arithmetic machinery.
Repro / Expected UX
Example (should stay on Arkouda)
>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series([1, 2, 3], dtype="ak")
>>> (np.add(s.array, 5)).to_numpy() # materialize only at the end
array([6, 7, 8])
Example (unsupported dtype gives helpful error)
>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series(["a", "b"], dtype="ak")
>>> np.add(s.array, "x")
TypeError: NumPy ufunc 'add' is not supported for Arkouda dtype 'string'
Tests
Add unit tests covering:
- Unary ufunc:
np.negative, np.absolute (numeric)
- Binary ufunc:
np.add, np.subtract, np.multiply, np.true_divide (numeric)
- Comparisons:
np.equal, np.less, etc. (numeric/bool)
- Mixed scalar + EA and EA + EA
out= behavior (whatever policy is chosen)
- Unsupported ufunc raises/returns NotImplemented in a predictable way
- Unsupported dtype (Strings/Categorical) raises a clear
TypeError
- Ensure no silent
to_numpy() / materialization occurs in the supported paths:
- validate the result is an Arkouda
ExtensionArray (or wraps one)
Acceptance Criteria
__array_ufunc__ is implemented on the Arkouda ExtensionArray.
- Core elementwise numeric ufuncs work end-to-end without NumPy materialization.
- Unsupported ufuncs/dtypes produce clear, consistent errors.
- Test suite includes coverage for supported, unsupported, and edge cases (including
out=).
- Documentation/comments explain the supported ufunc surface and rationale for exclusions.
Summary
Implement NumPy ufunc interoperability for the Arkouda pandas
ExtensionArrayby adding a correct, well-scoped__array_ufunc__implementation. This will allow common ufuncs (e.g.,np.add,np.subtract,np.negative,np.logical_and, comparisons, etc.) to operate on Arkouda-backedSeries/arrays without silently materializing to NumPy, while preserving pandas semantics where required.Background / Motivation
Today, many NumPy ufunc operations on Arkouda-backed pandas objects either:
__array_ufunc__and__array_priority__behavior for ExtensionArrays.A minimal-but-correct
__array_ufunc__enables:Strings,Categorical) or unsupported ufuncs/methods.Goals
__array_ufunc__to the ArkoudaExtensionArrayimplementation.method="__call__"andmethod="reduce"(as appropriate) with clear scoping.ExtensionArray(orSeriesvia pandas) when appropriate,np.nan/ missing values correctly (where applicable),outbeing a NumPy array, or ufunc not supported).Non-goals (for this ticket)
accumulate,reduceat,outer, etc.).StringsandCategoricalunless there is a clear, existing Arkouda primitive (should raise a helpfulTypeErrorfor now).Proposed Behavior
Supported inputs
selfis the ArkoudaExtensionArray.ExtensionArrayinstances,Dispatch rules
methodvalues withNotImplemented(orTypeErrorif pandas expects it), except for:__call__(required)reduce(optional, only for a small safe subset such asnp.add.reduce,np.logical_or.reduceif Arkouda equivalents exist)NotImplemented.negative,absolute,invert(for bool/int), etc.add,subtract,multiply,true_divide,floor_divide,power(if supported), comparisons (equal,not_equal,less,greater, etc.), logical ops for bool.outis provided:outcontains Arkouda ExtensionArrays: write into those (if we support it), else reject with a clear error.outcontains NumPy arrays: either materialize (explicit) or raise (preferred) — pick one and document it.ExtensionArraywith the result.reduce: return a scalar (Python/NumPy scalar) or a 0-dim equivalent consistent with pandas expectations.Error messages
TypeErrorlike:"NumPy ufunc '<name>' is not supported for Arkouda dtype '<dtype>'"NotImplementedErroror returnNotImplementeddepending on pandas expectations; include a message guiding users to convert explicitly if they really want NumPy.Implementation Notes
_UFUNC_TABLE: dict[np.ufunc, callable]or mapping byufunc.__name__.__array_priority__(set high enough to win dispatch vs NumPy when appropriate),__array__(if implemented) does not accidentally trigger conversions in the ufunc path.__array_ufunc__does not breakSeriesops that pandas already routes through its own arithmetic machinery.Repro / Expected UX
Example (should stay on Arkouda)
Example (unsupported dtype gives helpful error)
Tests
Add unit tests covering:
np.negative,np.absolute(numeric)np.add,np.subtract,np.multiply,np.true_divide(numeric)np.equal,np.less, etc. (numeric/bool)out=behavior (whatever policy is chosen)TypeErrorto_numpy()/ materialization occurs in the supported paths:ExtensionArray(or wraps one)Acceptance Criteria
__array_ufunc__is implemented on the ArkoudaExtensionArray.out=).