Skip to content

Commit 6cb20f7

Browse files
gh-151757: Support wide and combining characters in the curses module
The character-cell window methods now accept a full character cell -- a spacing character optionally followed by combining characters (up to CCHARW_MAX wide characters) -- in addition to a single int or byte character. This affects addch(), bkgd(), bkgdset(), border(), box(), echochar(), hline(), insch() and vline(); they dispatch to the ncursesw wide-character functions (wadd_wch(), wbkgrnd(), wborder_set(), wecho_wchar(), whline_set(), wins_wch(), wvline_set(), ...) when given a string. border() and box() cannot mix integer or byte characters with wide string characters in a single call. A cell is one spacing character optionally followed by combining characters, so an extra spacing or control character (such as "ab") is rejected with ValueError rather than being silently truncated by setcchar(). Also add the wide-character read methods get_wstr() and in_wstr(), the counterparts of getstr() and instr() that return a str rather than a bytes object, and the module functions erasewchar(), killwchar() and wunctrl(), the wide-character counterparts of erasechar(), killchar() and unctrl(). All of this is available only when built against the wide-character ncursesw library. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 2e5843e commit 6cb20f7

6 files changed

Lines changed: 753 additions & 45 deletions

File tree

Doc/library/curses.rst

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,16 @@ The module :mod:`!curses` defines the following functions:
194194
the curses library itself.
195195

196196

197+
.. function:: erasewchar()
198+
199+
Return the user's current erase character as a one-character string.
200+
This is the wide-character variant of :func:`erasechar`. Availability
201+
depends on building Python against a wide-character-aware version of the
202+
underlying curses library.
203+
204+
.. versionadded:: next
205+
206+
197207
.. function:: filter()
198208

199209
The :func:`.filter` routine, if used, must be called before :func:`initscr` is
@@ -379,6 +389,16 @@ The module :mod:`!curses` defines the following functions:
379389
by the curses library itself.
380390

381391

392+
.. function:: killwchar()
393+
394+
Return the user's current line kill character as a one-character string.
395+
This is the wide-character variant of :func:`killchar`. Availability
396+
depends on building Python against a wide-character-aware version of the
397+
underlying curses library.
398+
399+
.. versionadded:: next
400+
401+
382402
.. function:: longname()
383403

384404
Return a bytes object containing the terminfo long name field describing the current
@@ -690,6 +710,18 @@ The module :mod:`!curses` defines the following functions:
690710
example as ``b'^C'``. Printing characters are left as they are.
691711

692712

713+
.. function:: wunctrl(ch)
714+
715+
Return a string which is a printable representation of the wide character *ch*.
716+
Control characters are represented as a caret followed by the character, for
717+
example as ``'^C'``. Printing characters are left as they are. This is the
718+
wide-character variant of :func:`unctrl`, returning a :class:`str` rather than
719+
:class:`bytes`. Availability depends on building Python against a
720+
wide-character-aware version of the underlying curses library.
721+
722+
.. versionadded:: next
723+
724+
693725
.. function:: ungetch(ch)
694726

695727
Push *ch* so the next :meth:`~window.getch` will return it.
@@ -770,12 +802,19 @@ Window objects
770802
character previously painted at that location. By default, the character
771803
position and attributes are the current settings for the window object.
772804

805+
*ch* may be a single character, optionally followed by combining
806+
characters, that together occupy one character cell.
807+
773808
.. note::
774809

775810
Writing outside the window, subwindow, or pad raises a :exc:`curses.error`.
776811
Attempting to write to the lower-right corner of a window, subwindow,
777812
or pad will cause an exception to be raised after the character is printed.
778813

814+
.. versionchanged:: next
815+
A character may now be given as a string of a base character followed
816+
by combining characters, instead of only a single character.
817+
779818

780819
.. method:: window.addnstr(str, n[, attr])
781820
window.addnstr(y, x, str, n[, attr])
@@ -834,6 +873,9 @@ Window objects
834873
* Wherever the former background character appears, it is changed to the new
835874
background character.
836875

876+
.. versionchanged:: next
877+
Wide and combining characters are now accepted.
878+
837879

838880
.. method:: window.bkgdset(ch[, attr])
839881

@@ -844,6 +886,9 @@ Window objects
844886
characters. The background becomes a property of the character and moves with
845887
the character through any scrolling and insert/delete line/character operations.
846888

889+
.. versionchanged:: next
890+
Wide and combining characters are now accepted.
891+
847892

848893
.. method:: window.border([ls[, rs[, ts[, bs[, tl[, tr[, bl[, br]]]]]]]])
849894

@@ -877,12 +922,20 @@ Window objects
877922
| *br* | Bottom-right corner | :const:`ACS_LRCORNER` |
878923
+-----------+---------------------+-----------------------+
879924

925+
.. versionchanged:: next
926+
Wide and combining characters are now accepted. A single call cannot mix
927+
them with integer or byte characters.
928+
880929

881930
.. method:: window.box([vertch, horch])
882931

883932
Similar to :meth:`border`, but both *ls* and *rs* are *vertch* and both *ts* and
884933
*bs* are *horch*. The default corner characters are always used by this function.
885934

935+
.. versionchanged:: next
936+
Wide and combining characters are now accepted. A single call cannot mix
937+
them with integer or byte characters.
938+
886939

887940
.. method:: window.chgat(attr)
888941
window.chgat(num, attr)
@@ -951,6 +1004,9 @@ Window objects
9511004
Add character *ch* with attribute *attr*, and immediately call :meth:`refresh`
9521005
on the window.
9531006

1007+
.. versionchanged:: next
1008+
Wide and combining characters are now accepted.
1009+
9541010

9551011
.. method:: window.enclose(y, x)
9561012

@@ -1038,6 +1094,20 @@ Window objects
10381094
The maximum value for *n* was increased from 1023 to 2047.
10391095

10401096

1097+
.. method:: window.get_wstr()
1098+
window.get_wstr(n)
1099+
window.get_wstr(y, x)
1100+
window.get_wstr(y, x, n)
1101+
1102+
Read a string from the user, with primitive line editing capacity.
1103+
This is the wide-character variant of :meth:`getstr`: it returns a
1104+
:class:`str` rather than a :class:`bytes` object, so it can return
1105+
characters that are not representable in the window's encoding.
1106+
At most *n* characters are read; *n* defaults to and cannot exceed 2047.
1107+
1108+
.. versionadded:: next
1109+
1110+
10411111
.. method:: window.getyx()
10421112

10431113
Return a tuple ``(y, x)`` of current cursor position relative to the window's
@@ -1051,6 +1121,9 @@ Window objects
10511121
the character *ch* with attributes *attr*. The line stops at the right edge
10521122
of the window if fewer than *n* cells are available.
10531123

1124+
.. versionchanged:: next
1125+
Wide and combining characters are now accepted.
1126+
10541127

10551128
.. method:: window.idcok(flag)
10561129

@@ -1088,6 +1161,9 @@ Window objects
10881161
cursor are shifted one position right, with the rightmost character on the
10891162
line being lost. The cursor position does not change.
10901163

1164+
.. versionchanged:: next
1165+
Wide and combining characters are now accepted.
1166+
10911167

10921168
.. method:: window.insdelln(nlines)
10931169

@@ -1137,6 +1213,19 @@ Window objects
11371213
The maximum value for *n* was increased from 1023 to 2047.
11381214

11391215

1216+
.. method:: window.in_wstr([n])
1217+
window.in_wstr(y, x[, n])
1218+
1219+
Return a string of characters, extracted from the window starting at the
1220+
current cursor position, or at *y*, *x* if specified. This is the
1221+
wide-character variant of :meth:`instr`: it returns a :class:`str` rather
1222+
than a :class:`bytes` object, so it can return characters that are not
1223+
representable in the window's encoding. Attributes and color information
1224+
are stripped from the characters. The maximum value for *n* is 2047.
1225+
1226+
.. versionadded:: next
1227+
1228+
11401229
.. method:: window.is_linetouched(line)
11411230

11421231
Return ``True`` if the specified line was modified since the last call to
@@ -1386,6 +1475,9 @@ Window objects
13861475
Display a vertical line starting at ``(y, x)`` with length *n* consisting of the
13871476
character *ch* with attributes *attr*.
13881477

1478+
.. versionchanged:: next
1479+
Wide and combining characters are now accepted.
1480+
13891481

13901482
Constants
13911483
---------

Doc/whatsnew/3.16.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,25 @@ Improved modules
8989
curses
9090
------
9191

92+
* The :mod:`curses` character-cell window methods now accept a full character
93+
cell --- a spacing character optionally followed by combining characters ---
94+
in addition to a single integer or byte character. This affects
95+
:meth:`~curses.window.addch`, :meth:`~curses.window.bkgd`,
96+
:meth:`~curses.window.bkgdset`, :meth:`~curses.window.border`,
97+
:meth:`~curses.window.box`, :meth:`~curses.window.echochar`,
98+
:meth:`~curses.window.hline`, :meth:`~curses.window.insch` and
99+
:meth:`~curses.window.vline`.
100+
Also add the wide-character read methods :meth:`~curses.window.get_wstr` and
101+
:meth:`~curses.window.in_wstr`, the counterparts of
102+
:meth:`~curses.window.getstr` and :meth:`~curses.window.instr` that return a
103+
:class:`str` rather than :class:`bytes`,
104+
and the module functions :func:`curses.erasewchar`, :func:`curses.killwchar`
105+
and :func:`curses.wunctrl`, the wide-character counterparts of
106+
:func:`curses.erasechar`, :func:`curses.killchar` and :func:`curses.unctrl`.
107+
These features are only available when built against the wide-character
108+
ncursesw library.
109+
(Contributed by Serhiy Storchaka in :gh:`151757`.)
110+
92111
* Add :func:`curses.nofilter`, which undoes the effect of :func:`curses.filter`.
93112
(Contributed by Serhiy Storchaka in :gh:`151744`.)
94113

Lib/test/test_curses.py

Lines changed: 107 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,69 @@ def test_refresh_control(self):
253253
self.assertIs(win.is_wintouched(), syncok)
254254
self.assertIs(stdscr.is_wintouched(), syncok)
255255

256+
@requires_curses_window_meth('get_wch')
257+
def test_addch_combining(self):
258+
# A character cell may hold a spacing char plus combining marks.
259+
stdscr = self.stdscr
260+
stdscr.move(0, 0)
261+
stdscr.addch('e\u0301') # 'e' + COMBINING ACUTE ACCENT
262+
stdscr.addch(1, 0, 'a\u0323\u0300') # base plus two combining marks
263+
# Too many code points to fit in a single character cell.
264+
self.assertRaises(TypeError, stdscr.addch, 'e' + '\u0301' * 10)
265+
# Only the first code point may be a spacing character.
266+
self.assertRaises(ValueError, stdscr.addch, 'ab')
267+
self.assertRaises(ValueError, stdscr.addch, 'a\u0301b')
268+
# A lone control character is allowed (like addch(ord('\n'))), but it
269+
# cannot be combined with other characters, as base or otherwise.
270+
stdscr.addch('\n')
271+
self.assertRaises(ValueError, stdscr.addch, 'a\n')
272+
self.assertRaises(ValueError, stdscr.addch, '\n\u0301')
273+
self.assertRaises(ValueError, stdscr.addch, '\ne\u0301')
274+
275+
@requires_curses_window_meth('get_wch')
276+
def test_addch_emoji(self):
277+
# curses has no grapheme-cluster support: a cell holds one spacing
278+
# character plus zero-width combining characters. A lone emoji fits,
279+
# as does an emoji with a zero-width variation selector.
280+
stdscr = self.stdscr
281+
stdscr.addch(0, 0, '\U0001f600') # single emoji
282+
stdscr.addch(1, 0, '\u263a\ufe0f') # WHITE SMILING FACE + VS-16
283+
# An emoji ZWJ sequence or an emoji with a modifier is more than one
284+
# spacing character and cannot share a single cell.
285+
self.assertRaises(ValueError, stdscr.addch,
286+
'\U0001f44d\U0001f3fd') # thumbs up + skin tone
287+
self.assertRaises(ValueError, stdscr.addch,
288+
'\U0001f468\u200d\U0001f469') # man ZWJ woman
289+
290+
@requires_curses_window_meth('get_wch')
291+
def test_wide_characters(self):
292+
# Wide and combining characters in the character-cell methods.
293+
stdscr = self.stdscr
294+
combining = 'e\u0301' # 'e' + COMBINING ACUTE ACCENT
295+
vline, hline = '\u2502', '\u2500' # box-drawing vertical/horizontal
296+
stdscr.move(0, 0)
297+
stdscr.echochar(combining)
298+
stdscr.insch(1, 0, combining)
299+
stdscr.hline(2, 0, hline, 5)
300+
stdscr.vline(3, 0, vline, 3)
301+
stdscr.bkgdset(combining)
302+
stdscr.bkgd(combining)
303+
stdscr.border(vline, vline, hline, hline)
304+
stdscr.box(vline, hline)
305+
# border() and box() cannot mix integer and wide-string characters.
306+
self.assertRaises(TypeError, stdscr.box, vline, ord('-'))
307+
308+
309+
@requires_curses_window_meth('in_wstr')
310+
def test_in_wstr(self):
311+
# The wide-character window read returns a str (instr returns bytes).
312+
stdscr = self.stdscr
313+
s = 'a\u00e9\u2502z' # 'a', 'e'+acute (precomposed), box vline, 'z'
314+
stdscr.addstr(0, 0, s)
315+
self.assertEqual(stdscr.in_wstr(0, 0, len(s)), s)
316+
self.assertIsInstance(stdscr.instr(0, 0, len(s)), bytes)
317+
318+
256319
def test_output_character(self):
257320
stdscr = self.stdscr
258321
encoding = stdscr.encoding
@@ -281,13 +344,16 @@ def test_output_character(self):
281344
stdscr.echochar('A')
282345
stdscr.echochar(b'A')
283346
stdscr.echochar(65)
284-
with self.assertRaises((UnicodeEncodeError, OverflowError)):
285-
# Unicode is not fully supported yet, but at least it does
286-
# not crash.
287-
# It is supposed to fail because either the character is
288-
# not encodable with the current encoding, or it is encoded to
289-
# a multibyte sequence.
290-
stdscr.echochar('\u0114')
347+
c = '\u0114'
348+
try:
349+
stdscr.echochar(c)
350+
except UnicodeEncodeError:
351+
# The character is not encodable with the current encoding.
352+
self.assertRaises(UnicodeEncodeError, c.encode, encoding)
353+
except OverflowError:
354+
# The character is encoded to a multibyte sequence.
355+
encoded = c.encode(encoding)
356+
self.assertNotEqual(len(encoded), 1, repr(encoded))
291357
stdscr.echochar('A', curses.A_BOLD)
292358
self.assertIs(stdscr.is_wintouched(), False)
293359

@@ -742,7 +808,6 @@ def test_borders_and_lines(self):
742808
self.assertEqual(win.inch(3, 1), b'a'[0])
743809

744810
def test_unctrl(self):
745-
# TODO: wunctrl()
746811
self.assertEqual(curses.unctrl(b'A'), b'A')
747812
self.assertEqual(curses.unctrl('A'), b'A')
748813
self.assertEqual(curses.unctrl(65), b'A')
@@ -753,6 +818,21 @@ def test_unctrl(self):
753818
self.assertRaises(TypeError, curses.unctrl, b'AB')
754819
self.assertRaises(TypeError, curses.unctrl, '')
755820
self.assertRaises(TypeError, curses.unctrl, 'AB')
821+
822+
@requires_curses_func('wunctrl')
823+
def test_wunctrl(self):
824+
# The wide-character variant of unctrl() returns a str.
825+
self.assertEqual(curses.wunctrl(b'A'), 'A')
826+
self.assertEqual(curses.wunctrl('A'), 'A')
827+
self.assertEqual(curses.wunctrl(65), 'A')
828+
self.assertEqual(curses.wunctrl('\n'), '^J')
829+
self.assertEqual(curses.wunctrl(10), '^J')
830+
self.assertEqual(curses.wunctrl('é'), 'é') # printable
831+
self.assertRaises(TypeError, curses.wunctrl, b'')
832+
self.assertRaises(TypeError, curses.wunctrl, b'AB')
833+
self.assertRaises(TypeError, curses.wunctrl, '')
834+
# More than one spacing character is not a single cell.
835+
self.assertRaises(ValueError, curses.wunctrl, 'AB')
756836
self.assertRaises(OverflowError, curses.unctrl, 2**64)
757837

758838
def test_endwin(self):
@@ -800,7 +880,7 @@ def test_misc_module_funcs(self):
800880
curses.newpad(50, 50)
801881

802882
def test_env_queries(self):
803-
# TODO: term_attrs(), erasewchar(), killwchar()
883+
# TODO: term_attrs()
804884
self.assertIsInstance(curses.termname(), bytes)
805885
self.assertIsInstance(curses.longname(), bytes)
806886
self.assertIsInstance(curses.baudrate(), int)
@@ -815,6 +895,24 @@ def test_env_queries(self):
815895
self.assertIsInstance(c, bytes)
816896
self.assertEqual(len(c), 1)
817897

898+
# The erase and kill characters are a property of the controlling
899+
# terminal: the wide variants report ERR (raising curses.error) without
900+
# one, while the narrow variants above return an unspecified byte.
901+
try:
902+
tty_fd = os.open(os.ctermid(), os.O_RDONLY)
903+
except OSError:
904+
tty_fd = None
905+
if tty_fd is not None:
906+
os.close(tty_fd)
907+
if hasattr(curses, 'erasewchar'):
908+
c = curses.erasewchar()
909+
self.assertIsInstance(c, str)
910+
self.assertEqual(len(c), 1)
911+
if hasattr(curses, 'killwchar'):
912+
c = curses.killwchar()
913+
self.assertIsInstance(c, str)
914+
self.assertEqual(len(c), 1)
915+
818916
def test_output_options(self):
819917
stdscr = self.stdscr
820918

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
The :mod:`curses` character-cell window methods now accept a full character
2+
cell -- a spacing character optionally followed by combining characters -- in
3+
addition to a single integer or byte character. Add the wide-character read
4+
methods :meth:`curses.window.get_wstr` and :meth:`curses.window.in_wstr`, and
5+
the functions :func:`curses.erasewchar`, :func:`curses.killwchar` and
6+
:func:`curses.wunctrl`. These features are only available when built against
7+
the wide-character ncursesw library.

0 commit comments

Comments
 (0)