Skip to content

Commit 4e7ae36

Browse files
JukkaLp-sawicki
andauthored
[mypyc] Document librt.vecs (#21437)
Co-authored-by: Piotr Sawicki <sawickipiotr@outlook.com>
1 parent d0b3fb3 commit 4e7ae36

3 files changed

Lines changed: 256 additions & 0 deletions

File tree

mypyc/doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ generate fast code.
3535
librt_base64
3636
librt_strings
3737
librt_time
38+
librt_vecs
3839

3940
.. toctree::
4041
:maxdepth: 2

mypyc/doc/librt.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Follow submodule links in the table to a detailed description of each submodule.
3030
- String and bytes utilities
3131
* - :doc:`librt.time <librt_time>`
3232
- Time utilities
33+
* - :doc:`librt.vecs <librt_vecs>`
34+
- Fast growable array type ``vec``
3335

3436
Installing librt
3537
----------------

mypyc/doc/librt_vecs.rst

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
librt.vecs
2+
==========
3+
4+
The ``librt.vecs`` module defines the ``vec`` type, a low-level, uniform growable array type.
5+
It's part of the ``librt`` package on PyPI.
6+
7+
When constructing a ``vec``, the item type ``T`` is always explicitly given via ``vec[T]``::
8+
9+
from librt.vecs import append, vec
10+
11+
v = vec[float]([1.0, 2.5]) # Construct vec[float] with two items
12+
13+
``vec`` supports many sequence operations, though it's not a full sequence type::
14+
15+
len(v) # 2
16+
v[0] # 1.0
17+
v[-1] # 2.5
18+
for x in v:
19+
print(x)
20+
21+
The length of each ``vec`` value is immutable. Appending an item is still a fast operation,
22+
but it returns a new ``vec`` value::
23+
24+
v = append(v, -0.5)
25+
print(v) # vec[float]([1.0, 2.5, -0.5])
26+
27+
``vec`` only supports simple, uniform item types. It uses an efficient packed binary encoding
28+
for these *value item types*:
29+
30+
* ``mypy_extensions.i64`` (signed 64-bit integer)
31+
* ``mypy_extensions.i32`` (signed 32-bit integer)
32+
* ``mypy_extensions.i16`` (signed 16-bit integer)
33+
* ``mypy_extensions.u8`` (unsigned byte)
34+
* ``float`` (64-bit float)
35+
* ``bool``
36+
37+
``int`` is not a valid item type, since it has an arbitrary precision, and vec is an
38+
efficiency-focused type. Use one of the fixed-length integer types instead.
39+
40+
Class item types (e.g. ``str`` or ``MyNativeClass``) are represented as regular object references.
41+
Optional class item types (e.g. ``str | None``) are supported for convenience, but arbitrary
42+
union types are not supported as item types. Nested vecs are supported, e.g. ``vec[vec[i64]]``.
43+
44+
A vec value is often used as an efficient alternative to ``list`` or ``array.array`` in code
45+
compiled using mypyc. Its primary advantages are an efficient packed memory representation
46+
for value item types and very fast inlined get and set item operations.
47+
48+
Vec instances perform runtime checking of item types. Since values of type variables are
49+
not available at runtime (they are *erased*), type variables can't be used as item types.
50+
51+
A vec value is effectively an immutable (length, buffer) pair. This means that any operation
52+
that changes the length of a vec, including ``append`` as we saw above, returns a modified
53+
value.
54+
55+
.. note::
56+
An immutable length allows more efficient code to be generated by mypyc, and vec values
57+
can be allocated to machine registers effectively. However, vec values must be boxed
58+
if used in a non-native context, such as if added to a list or dict.
59+
60+
Here are some examples of valid vec types:
61+
62+
.. list-table::
63+
:header-rows: 1
64+
65+
* - Type
66+
- Item representation
67+
* - ``vec[i32]``
68+
- Packed 32-bit integers
69+
* - ``vec[float]``
70+
- Packed 64-bit floats
71+
* - ``vec[str]``
72+
- Object references
73+
* - ``vec[vec[u8]]``
74+
- Packed vec values
75+
76+
The ``vec`` class
77+
-----------------
78+
79+
.. class:: vec[T](items: Iterable[T] = ..., *, capacity: i64 = ...)
80+
81+
A generic growable array type. The runtime type parameter ``T`` used when
82+
calling ``vec[T](...)`` determines the element type.
83+
84+
The ``capacity`` parameter allows defining the minimum initial
85+
capacity of the buffer, some of which may be unused after
86+
construction. Unused capacity allows fast ``append`` and ``extend``
87+
operations that don't need to reallocate the buffer. Actual capacity
88+
will be larger than ``capacity`` if ``items`` has more than ``capacity``
89+
items.
90+
91+
Construction from ``list`` and ``tuple`` objects is optimized.
92+
Also, for value item types, construction from an object that implements
93+
the buffer protocol is optimized (such as ``bytes``), if the format
94+
is compatible with the vec item type.
95+
96+
Mypyc treats ``vec[T]([x] * n)`` as a special form. For example,
97+
``vec[u8]([0] * n)`` constructs a zero-initialized vec object
98+
efficiently, without building an intermediate list. There are
99+
also other constructor-related special forms -- see `Special
100+
forms`_ below.
101+
102+
It's an error to construct a ``vec`` object without providing an
103+
item type: ``vec()`` raises an exception.
104+
105+
.. describe:: len(v) → i64
106+
107+
Return the length of ``v``.
108+
109+
.. describe:: v[i] → T
110+
111+
Return item at index ``i`` (index may be negative).
112+
113+
.. describe:: v[i:j] → vec[T]
114+
115+
Return a slice. This constructs a new ``vec`` object. ``i`` and ``j`` may be negative.
116+
117+
.. describe:: v[i] = o
118+
119+
Assign to an item (index may be negative).
120+
121+
.. describe:: o in v → bool
122+
123+
Return True if ``v`` contains ``o``.
124+
125+
.. describe:: for o in v
126+
127+
Iterate over items.
128+
129+
.. describe:: memoryview(v)
130+
131+
``vec`` implements the buffer protocol, but only for value item types that use a
132+
packed representation.
133+
134+
Functions
135+
---------
136+
137+
Since the following operations return a modified value, they are module-level functions
138+
instead of methods.
139+
140+
.. function:: append(v: vec[T], o: T) -> vec[T]
141+
142+
Return ``v`` with item ``o`` appended to it. If ``v`` has unused capacity, reuse
143+
the existing buffer. The time complexity is O(1) on average. Example::
144+
145+
v = vec[i32]()
146+
v = append(v, 1)
147+
148+
.. function:: extend(v: vec[T], it: Iterable[T]) -> vec[T]
149+
150+
Return ``v`` with all items from iterable ``it`` appended to it. If ``v`` has sufficient
151+
unused capacity, reuse the existing buffer. The time complexity is O(n) on average,
152+
where n is the length of ``it``. Example::
153+
154+
v = vec[u8]()
155+
v = extend(v, b"foo")
156+
157+
.. function:: remove(v: vec[T], o: T) -> vec[T]
158+
159+
Return ``v`` with the first instance of item ``o`` removed. Reuse the buffer
160+
from ``v``. Raise ``ValueError`` if value doesn't exist. Example::
161+
162+
v = vec[i32]([1, 2, 3])
163+
v = remove(v, 2)
164+
# v has items [1, 3]
165+
166+
.. function:: pop(v: vec[T], i: i64 = -1) -> tuple[vec[T], T]
167+
168+
Return ``(new_v, item)``, where ``item`` is the value at index ``i`` and
169+
``new_v`` is ``v`` with that item removed. Reuse the buffer from ``v``.
170+
Example::
171+
172+
v = vec[i32]([1, 2, 3])
173+
v, x = pop(v)
174+
# x is 3; v has items [1, 2]
175+
176+
Special forms
177+
--------------
178+
179+
Certain combinations of operations that would be multiple separate operations in
180+
regular Python are guaranteed to be compiled by mypyc to direct operations
181+
with no unnecessary temporary objects.
182+
183+
.. list-table::
184+
:header-rows: 1
185+
186+
* - Special form
187+
- Description
188+
* - ``vec[T]()``
189+
- Construct empty vec with no buffer. This doesn't perform any dynamic allocation
190+
(at least for non-nested vecs).
191+
* - ``vec[T]([element1, ...])``
192+
- Directly construct a vec object with given items, without a temporary list.
193+
* - ``vec[T]([element1] * n)``
194+
- Directly construct a vec with length n, without any temporary list.
195+
* - ``vec[T]([<expr> for ... in <expr>])``
196+
- Vec comprehension creates no temporary list.
197+
198+
Thread safety
199+
-------------
200+
201+
In free-threaded Python builds, it's unsafe to write or modify an item if other
202+
threads might be concurrently accessing *the same item*. For example, writing ``v[4]``
203+
is not safe to do if another thread might be reading ``v[4]``. Similarly, two
204+
threads concurrently calling ``append`` or ``remove`` on the same vec object is not safe.
205+
206+
This is different from list objects, since vec is a lower-level type where implicit
207+
synchronization would have a significant performance cost. However, since vec lengths
208+
are immutable, some race conditions that lists can be susceptible to are not possible
209+
with vecs.
210+
211+
Implementation details
212+
----------------------
213+
214+
In a native context, such as in a local variable or a parameter in a native function,
215+
or in an attribute of a native class, vec values are implemented as value objects with two
216+
fields: length and buffer. The buffer is a normal Python object, but it's not directly
217+
accessible to users. If a vec object is empty, no buffer object is required. This means that
218+
empty vecs are particularly efficient in a native context (usually 16 bytes).
219+
220+
A packed representation is used for buffers with supported value item types, including for
221+
nested vecs. The packed representation is much more efficient than a Python list object, and
222+
it's also significantly more efficient than ``array.array`` for small sequences.
223+
224+
Multiple vec values can share the same underlying buffer. For example, assigning a vec
225+
to another variable creates an alias that refers to the same buffer::
226+
227+
v = vec[i32]([1, 2, 3], capacity=3)
228+
w = v # v and w share the same buffer
229+
230+
w[0] = 99
231+
print(v[0]) # 99 -- both see the change
232+
233+
However, this sharing is not guaranteed to persist if there are operations that change
234+
the length (such as ``append``). These may reallocate the buffer, breaking the sharing
235+
silently::
236+
237+
v = append(v, 4) # reallocates the buffer since there is no free capacity
238+
v[0] = 0
239+
print(w[0]) # still 99 -- v and w no longer share a buffer
240+
241+
If you need independent copies, use slicing (``v[:]``) to explicitly create a vec with
242+
its own buffer. It's not recommended to rely on the details of buffer reallocation,
243+
as these might change between ``librt`` releases.
244+
245+
Using vecs outside compiled code
246+
--------------------------------
247+
248+
``vec`` is fully supported in non-compiled code, but ``vec`` values will be boxed in such
249+
non-native contexts. There will be always two objects, a boxed vec object and a buffer object,
250+
whereas in native contexts usually only the buffer is a dynamically allocated object.
251+
``vec`` is primarily useful in code compiled using mypyc, and it's been heavily optimized
252+
for this use case. There may be no performance benefit in interpreted code over using
253+
``list`` or ``array.array``.

0 commit comments

Comments
 (0)