|
| 1 | +librt.vecs |
| 2 | +========== |
| 3 | + |
| 4 | +The ``librt.vecs`` module defines the ``vec`` type, a low-level, uniform growable array type. |
| 5 | +It's part of the ``librt`` package on PyPI. |
| 6 | + |
| 7 | +When constructing a ``vec``, the item type ``T`` is always explicitly given via ``vec[T]``:: |
| 8 | + |
| 9 | + from librt.vecs import append, vec |
| 10 | + |
| 11 | + v = vec[float]([1.0, 2.5]) # Construct vec[float] with two items |
| 12 | + |
| 13 | +``vec`` supports many sequence operations, though it's not a full sequence type:: |
| 14 | + |
| 15 | + len(v) # 2 |
| 16 | + v[0] # 1.0 |
| 17 | + v[-1] # 2.5 |
| 18 | + for x in v: |
| 19 | + print(x) |
| 20 | + |
| 21 | +The length of each ``vec`` value is immutable. Appending an item is still a fast operation, |
| 22 | +but it returns a new ``vec`` value:: |
| 23 | + |
| 24 | + v = append(v, -0.5) |
| 25 | + print(v) # vec[float]([1.0, 2.5, -0.5]) |
| 26 | + |
| 27 | +``vec`` only supports simple, uniform item types. It uses an efficient packed binary encoding |
| 28 | +for these *value item types*: |
| 29 | + |
| 30 | +* ``mypy_extensions.i64`` (signed 64-bit integer) |
| 31 | +* ``mypy_extensions.i32`` (signed 32-bit integer) |
| 32 | +* ``mypy_extensions.i16`` (signed 16-bit integer) |
| 33 | +* ``mypy_extensions.u8`` (unsigned byte) |
| 34 | +* ``float`` (64-bit float) |
| 35 | +* ``bool`` |
| 36 | + |
| 37 | +``int`` is not a valid item type, since it has an arbitrary precision, and vec is an |
| 38 | +efficiency-focused type. Use one of the fixed-length integer types instead. |
| 39 | + |
| 40 | +Class item types (e.g. ``str`` or ``MyNativeClass``) are represented as regular object references. |
| 41 | +Optional class item types (e.g. ``str | None``) are supported for convenience, but arbitrary |
| 42 | +union types are not supported as item types. Nested vecs are supported, e.g. ``vec[vec[i64]]``. |
| 43 | + |
| 44 | +A vec value is often used as an efficient alternative to ``list`` or ``array.array`` in code |
| 45 | +compiled using mypyc. Its primary advantages are an efficient packed memory representation |
| 46 | +for value item types and very fast inlined get and set item operations. |
| 47 | + |
| 48 | +Vec instances perform runtime checking of item types. Since values of type variables are |
| 49 | +not available at runtime (they are *erased*), type variables can't be used as item types. |
| 50 | + |
| 51 | +A vec value is effectively an immutable (length, buffer) pair. This means that any operation |
| 52 | +that changes the length of a vec, including ``append`` as we saw above, returns a modified |
| 53 | +value. |
| 54 | + |
| 55 | +.. note:: |
| 56 | + An immutable length allows more efficient code to be generated by mypyc, and vec values |
| 57 | + can be allocated to machine registers effectively. However, vec values must be boxed |
| 58 | + if used in a non-native context, such as if added to a list or dict. |
| 59 | + |
| 60 | +Here are some examples of valid vec types: |
| 61 | + |
| 62 | +.. list-table:: |
| 63 | + :header-rows: 1 |
| 64 | + |
| 65 | + * - Type |
| 66 | + - Item representation |
| 67 | + * - ``vec[i32]`` |
| 68 | + - Packed 32-bit integers |
| 69 | + * - ``vec[float]`` |
| 70 | + - Packed 64-bit floats |
| 71 | + * - ``vec[str]`` |
| 72 | + - Object references |
| 73 | + * - ``vec[vec[u8]]`` |
| 74 | + - Packed vec values |
| 75 | + |
| 76 | +The ``vec`` class |
| 77 | +----------------- |
| 78 | + |
| 79 | +.. class:: vec[T](items: Iterable[T] = ..., *, capacity: i64 = ...) |
| 80 | + |
| 81 | + A generic growable array type. The runtime type parameter ``T`` used when |
| 82 | + calling ``vec[T](...)`` determines the element type. |
| 83 | + |
| 84 | + The ``capacity`` parameter allows defining the minimum initial |
| 85 | + capacity of the buffer, some of which may be unused after |
| 86 | + construction. Unused capacity allows fast ``append`` and ``extend`` |
| 87 | + operations that don't need to reallocate the buffer. Actual capacity |
| 88 | + will be larger than ``capacity`` if ``items`` has more than ``capacity`` |
| 89 | + items. |
| 90 | + |
| 91 | + Construction from ``list`` and ``tuple`` objects is optimized. |
| 92 | + Also, for value item types, construction from an object that implements |
| 93 | + the buffer protocol is optimized (such as ``bytes``), if the format |
| 94 | + is compatible with the vec item type. |
| 95 | + |
| 96 | + Mypyc treats ``vec[T]([x] * n)`` as a special form. For example, |
| 97 | + ``vec[u8]([0] * n)`` constructs a zero-initialized vec object |
| 98 | + efficiently, without building an intermediate list. There are |
| 99 | + also other constructor-related special forms -- see `Special |
| 100 | + forms`_ below. |
| 101 | + |
| 102 | + It's an error to construct a ``vec`` object without providing an |
| 103 | + item type: ``vec()`` raises an exception. |
| 104 | + |
| 105 | + .. describe:: len(v) → i64 |
| 106 | + |
| 107 | + Return the length of ``v``. |
| 108 | + |
| 109 | + .. describe:: v[i] → T |
| 110 | + |
| 111 | + Return item at index ``i`` (index may be negative). |
| 112 | + |
| 113 | + .. describe:: v[i:j] → vec[T] |
| 114 | + |
| 115 | + Return a slice. This constructs a new ``vec`` object. ``i`` and ``j`` may be negative. |
| 116 | + |
| 117 | + .. describe:: v[i] = o |
| 118 | + |
| 119 | + Assign to an item (index may be negative). |
| 120 | + |
| 121 | + .. describe:: o in v → bool |
| 122 | + |
| 123 | + Return True if ``v`` contains ``o``. |
| 124 | + |
| 125 | + .. describe:: for o in v |
| 126 | + |
| 127 | + Iterate over items. |
| 128 | + |
| 129 | + .. describe:: memoryview(v) |
| 130 | + |
| 131 | + ``vec`` implements the buffer protocol, but only for value item types that use a |
| 132 | + packed representation. |
| 133 | + |
| 134 | +Functions |
| 135 | +--------- |
| 136 | + |
| 137 | +Since the following operations return a modified value, they are module-level functions |
| 138 | +instead of methods. |
| 139 | + |
| 140 | +.. function:: append(v: vec[T], o: T) -> vec[T] |
| 141 | + |
| 142 | + Return ``v`` with item ``o`` appended to it. If ``v`` has unused capacity, reuse |
| 143 | + the existing buffer. The time complexity is O(1) on average. Example:: |
| 144 | + |
| 145 | + v = vec[i32]() |
| 146 | + v = append(v, 1) |
| 147 | + |
| 148 | +.. function:: extend(v: vec[T], it: Iterable[T]) -> vec[T] |
| 149 | + |
| 150 | + Return ``v`` with all items from iterable ``it`` appended to it. If ``v`` has sufficient |
| 151 | + unused capacity, reuse the existing buffer. The time complexity is O(n) on average, |
| 152 | + where n is the length of ``it``. Example:: |
| 153 | + |
| 154 | + v = vec[u8]() |
| 155 | + v = extend(v, b"foo") |
| 156 | + |
| 157 | +.. function:: remove(v: vec[T], o: T) -> vec[T] |
| 158 | + |
| 159 | + Return ``v`` with the first instance of item ``o`` removed. Reuse the buffer |
| 160 | + from ``v``. Raise ``ValueError`` if value doesn't exist. Example:: |
| 161 | + |
| 162 | + v = vec[i32]([1, 2, 3]) |
| 163 | + v = remove(v, 2) |
| 164 | + # v has items [1, 3] |
| 165 | + |
| 166 | +.. function:: pop(v: vec[T], i: i64 = -1) -> tuple[vec[T], T] |
| 167 | + |
| 168 | + Return ``(new_v, item)``, where ``item`` is the value at index ``i`` and |
| 169 | + ``new_v`` is ``v`` with that item removed. Reuse the buffer from ``v``. |
| 170 | + Example:: |
| 171 | + |
| 172 | + v = vec[i32]([1, 2, 3]) |
| 173 | + v, x = pop(v) |
| 174 | + # x is 3; v has items [1, 2] |
| 175 | + |
| 176 | +Special forms |
| 177 | +-------------- |
| 178 | + |
| 179 | +Certain combinations of operations that would be multiple separate operations in |
| 180 | +regular Python are guaranteed to be compiled by mypyc to direct operations |
| 181 | +with no unnecessary temporary objects. |
| 182 | + |
| 183 | +.. list-table:: |
| 184 | + :header-rows: 1 |
| 185 | + |
| 186 | + * - Special form |
| 187 | + - Description |
| 188 | + * - ``vec[T]()`` |
| 189 | + - Construct empty vec with no buffer. This doesn't perform any dynamic allocation |
| 190 | + (at least for non-nested vecs). |
| 191 | + * - ``vec[T]([element1, ...])`` |
| 192 | + - Directly construct a vec object with given items, without a temporary list. |
| 193 | + * - ``vec[T]([element1] * n)`` |
| 194 | + - Directly construct a vec with length n, without any temporary list. |
| 195 | + * - ``vec[T]([<expr> for ... in <expr>])`` |
| 196 | + - Vec comprehension creates no temporary list. |
| 197 | + |
| 198 | +Thread safety |
| 199 | +------------- |
| 200 | + |
| 201 | +In free-threaded Python builds, it's unsafe to write or modify an item if other |
| 202 | +threads might be concurrently accessing *the same item*. For example, writing ``v[4]`` |
| 203 | +is not safe to do if another thread might be reading ``v[4]``. Similarly, two |
| 204 | +threads concurrently calling ``append`` or ``remove`` on the same vec object is not safe. |
| 205 | + |
| 206 | +This is different from list objects, since vec is a lower-level type where implicit |
| 207 | +synchronization would have a significant performance cost. However, since vec lengths |
| 208 | +are immutable, some race conditions that lists can be susceptible to are not possible |
| 209 | +with vecs. |
| 210 | + |
| 211 | +Implementation details |
| 212 | +---------------------- |
| 213 | + |
| 214 | +In a native context, such as in a local variable or a parameter in a native function, |
| 215 | +or in an attribute of a native class, vec values are implemented as value objects with two |
| 216 | +fields: length and buffer. The buffer is a normal Python object, but it's not directly |
| 217 | +accessible to users. If a vec object is empty, no buffer object is required. This means that |
| 218 | +empty vecs are particularly efficient in a native context (usually 16 bytes). |
| 219 | + |
| 220 | +A packed representation is used for buffers with supported value item types, including for |
| 221 | +nested vecs. The packed representation is much more efficient than a Python list object, and |
| 222 | +it's also significantly more efficient than ``array.array`` for small sequences. |
| 223 | + |
| 224 | +Multiple vec values can share the same underlying buffer. For example, assigning a vec |
| 225 | +to another variable creates an alias that refers to the same buffer:: |
| 226 | + |
| 227 | + v = vec[i32]([1, 2, 3], capacity=3) |
| 228 | + w = v # v and w share the same buffer |
| 229 | + |
| 230 | + w[0] = 99 |
| 231 | + print(v[0]) # 99 -- both see the change |
| 232 | + |
| 233 | +However, this sharing is not guaranteed to persist if there are operations that change |
| 234 | +the length (such as ``append``). These may reallocate the buffer, breaking the sharing |
| 235 | +silently:: |
| 236 | + |
| 237 | + v = append(v, 4) # reallocates the buffer since there is no free capacity |
| 238 | + v[0] = 0 |
| 239 | + print(w[0]) # still 99 -- v and w no longer share a buffer |
| 240 | + |
| 241 | +If you need independent copies, use slicing (``v[:]``) to explicitly create a vec with |
| 242 | +its own buffer. It's not recommended to rely on the details of buffer reallocation, |
| 243 | +as these might change between ``librt`` releases. |
| 244 | + |
| 245 | +Using vecs outside compiled code |
| 246 | +-------------------------------- |
| 247 | + |
| 248 | +``vec`` is fully supported in non-compiled code, but ``vec`` values will be boxed in such |
| 249 | +non-native contexts. There will be always two objects, a boxed vec object and a buffer object, |
| 250 | +whereas in native contexts usually only the buffer is a dynamically allocated object. |
| 251 | +``vec`` is primarily useful in code compiled using mypyc, and it's been heavily optimized |
| 252 | +for this use case. There may be no performance benefit in interpreted code over using |
| 253 | +``list`` or ``array.array``. |
0 commit comments