Skip to content

GH-50355: [C++][Gandiva] fix out-of-bounds read in utf8_length_ignore_invalid#50356

Open
Arawoof06 wants to merge 1 commit into
apache:mainfrom
Arawoof06:utf8-ignore-invalid-overread
Open

GH-50355: [C++][Gandiva] fix out-of-bounds read in utf8_length_ignore_invalid#50356
Arawoof06 wants to merge 1 commit into
apache:mainfrom
Arawoof06:utf8-ignore-invalid-overread

Conversation

@Arawoof06

Copy link
Copy Markdown

Rationale for this change

utf8_length_ignore_invalid extends char_len while scanning continuation bytes and never rechecks the buffer end, so an input ending in a truncated multi-byte utf8 sequence (a 0xF0 lead byte followed by non-continuation bytes) reads past data_len. It is reached from untrusted string data through lpad/rpad, which count the input glyphs before padding. Reproduced against a verbatim copy of the function under AddressSanitizer with the 4-byte input {0xF0, 'a', 'a', 'a'} in an exactly-sized heap buffer, giving heap-buffer-overflow READ ... 0 bytes after 4-byte region.

What changes are included in this PR?

Bound the continuation-byte scan with i + j < data_len, the same guard the sibling helpers (utf8_length, reverse_utf8, utf8_byte_pos) already use. Valid input is unaffected because a well-formed glyph always satisfies i + char_len <= data_len.

Are these changes tested?

Yes. Added TestStringOps.TestPadMalformedUtf8NoOverread, which runs lpad/rpad on the truncated multi-byte input placed in an exactly-sized heap buffer so the over-read trips ASAN; the existing pad tests still pass.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix". It fixes an out-of-bounds read in the Gandiva utf8 length helper reachable from lpad/rpad on crafted string data.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #50355 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant