Harden XML parsing via copernik-xml-factory#640
Conversation
Replace all direct JAXP factory instantiations (DocumentBuilderFactory, SAXParserFactory, TransformerFactory) with the hardened factories from eu.copernik:copernik-xml-factory 0.1.1. These factories block external DTD and entity fetching and bound internal entity expansion, regardless of the JAXP implementation on the classpath. Hardening the parsing of a configuration file is admittedly not necessary: configuration files are normally trusted. This limits the side-effects if a user (against advice) decides to parse untrusted configuration files. Changes: - Add the copernik-xml-factory dependency. - Route factory creation through XmlFactories in XMLConfiguration, XMLDocumentHelper, XMLPropertiesConfiguration and XMLPropertyListConfiguration, plus the affected tests. - Harden the source passed to XMLDocumentHelper.transform. - XMLConfiguration's entity resolver now throws a SAXException for unregistered entities instead of returning null. Returning null would let the parser fall back to fetching the external resource and would override the deny-all resolver the hardening factories install on some implementations. This is done in an anonymous DefaultEntityResolver subclass local to XMLConfiguration, leaving the public DefaultEntityResolver contract (return null for unknown entities) unchanged. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 2001 test configuration references its schema through an absolute https URI (xsi:noNamespaceSchemaLocation). The hardened parser does not fetch external resources, so testSchemaValidationError no longer reached the intended schema validation error; the entity resolver refused the remote fetch first. Register the local testMultiConfiguration.xsd for that system URI via an XML catalog (CatalogResolver pointing at the existing catalog.xml, which already rewrites https://commons.apache.org/ to the local test resources) and set it as the builder's entity resolver. The schema now loads locally, so the expected SAXParseException validation error is raised again. Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Converting this to draft, since some Java-version dependent problems arose. |
|
-1: We've already discussed making an new Commons component for this code, let's do that please. |
Sorry, I was busy iterating on code and I forgot to mention that I created this PR only to test the capabilities (or missing capabilities) of For a new component to be useful, it should be adopted by at least a couple of projects. Currently I have tested support for such a library:
|
|
Shading can cause more problems than it's worth as exemplified with the mess some projects make shading ASM. When downstream users blow up because they can't update ASM to test or support a new Java version since that code is now a hard copy 🤔 |
|
JDK 8 is affected by a bug, which denies access based on This was theoretically fixed by JDK-8159139, but still doesn't work. |
I totally agree, but each project should be able to choose its poison. Unlike ASM, however, XML support is unlikely to change very much in the future, so if a project decides to replace a bunch of cargo-culted hardening options without any tests, with a shaded and relocated class, this still counts as a win. |
|
Thanks for the clarifications Piotr. Let's bring the component into Commons and iterate there. It will become more visible and it doesn't need to be perfect or bug free on the first commit. We'll want to create an alpha or milestone or pick-your-name release before a 1.0 anyway. WDYT? |
Apparently OpenJDK never got the fix, probably the fix is only present in Oracle's builds. This mean that on JDK 8 parsing this file will always fail if <configuration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="https://commons.apache.org/testMultiConfiguration.xsd">
...
</configuration>This is a bummer, because on JDK 9+ libraries can choose to call |
Sounds good to me: please choose a repository name and I'll transfer the code to Commons. Do we need a new JIRA project too? |
|
Here we go:
|
Summary
Route every direct JAXP factory instantiation through the hardened factories from
eu.copernik:copernik-xml-factory0.1.1. These factories block external DTD and entity fetching and bound internal entity expansion, regardless of the JAXP implementation on the classpath.Hardening the parsing of a configuration file is admittedly not necessary: configuration files are normally trusted. This limits the side-effects if a user (against advice) decides to parse untrusted configuration files.
Changes
copernik-xml-factorydependency.XmlFactories(DocumentBuilderFactory,SAXParserFactory,TransformerFactory) inXMLConfiguration,XMLDocumentHelper,XMLPropertiesConfigurationandXMLPropertyListConfiguration, plus the affected tests.XMLDocumentHelper.transform.XMLConfiguration's entity resolver now throws aSAXExceptionfor unregistered entities instead of returningnull. Returningnullwould let the parser fall back to fetching the external resource and would override the deny-all resolver the hardening factories install on some implementations. This is done in an anonymousDefaultEntityResolversubclass local toXMLConfiguration, leaving the publicDefaultEntityResolvercontract (returnnullfor unknown entities) unchanged.Testing
mvn testfor the affected XML test classes: 145 tests, all passing.mvn checkstyle:check: clean.