Skip to content

XML.toJSONObject() throws uncaught IllegalArgumentException for out-of-range numeric character references #1045

@yuki-matsuhashi

Description

@yuki-matsuhashi

In line with the repository's published vulnerability-reporting guidance, I am reporting this issue here as a public issue.

Summary

When parsing crafted XML containing an out-of-range numeric character reference such as �, XML#toJSONObject() throws an uncaught IllegalArgumentException instead of a controlled parsing exception such as JSONException.

As a result, applications that parse attacker-controlled XML may encounter an uncaught runtime exception. Depending on the integration, this may result in request failure or denial of service.

I reproduced this in release 20251224.

Details

The apparent root cause is in XMLTokener#unescapeEntity(), where a decoded numeric character reference is passed to string construction without first validating that it is a valid Unicode code point:

if (e.charAt(0) == '#') {
int cp;
if (e.charAt(1) == 'x' || e.charAt(1) == 'X') {
// hex encoded unicode
cp = Integer.parseInt(e.substring(2), 16);
} else {
// decimal encoded unicode
cp = Integer.parseInt(e.substring(1));
}
return new String(new int[] {cp},0,1);

Minimal PoC

XML.toJSONObject("<a>&#x110000;</a>");

I also checked a few closely related inputs while narrowing this down:

  • &#1114112; reproduces the same behavior.
  • The same behavior is also reachable from an attribute value, e.g. <a b="&#x110000;"/>.
  • &#xD800; did not reproduce the same uncaught exception in my testing.

This suggests that the immediate issue here is specifically the handling of out-of-range Unicode code points during string construction, rather than XML-invalid numeric character references in general.

Observed Result

java.lang.IllegalArgumentException: 1114112
        at java.base/java.lang.StringUTF16.toBytes(Unknown Source)
        at java.base/java.lang.String.<init>(Unknown Source)
        at org.json.XMLTokener.unescapeEntity(XMLTokener.java:171)
        at org.json.XMLTokener.nextEntity(XMLTokener.java:148)
        at org.json.XMLTokener.nextContent(XMLTokener.java:117)
        at org.json.XML.parse(XML.java:407)
        at org.json.XML.toJSONObject(XML.java:780)
        at org.json.XML.toJSONObject(XML.java:866)
        at org.json.XML.toJSONObject(XML.java:665)
        at PoC.main(PoC.java:7)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions