Avoid reflective InputSource modelId mutation#12147
Conversation
gnodet
left a comment
There was a problem hiding this comment.
This PR replaces a reflective Field.setAccessible hack in DefaultModelBuilder.doReadFileModel() with a proper approach: pre-parsing the POM to extract the model ID before constructing the InputSource. This is a great improvement — the reflective mutation of InputSource.modelId was fragile and would break under strict module systems or JDK 26+ integrity checks.
The implementation is solid overall. A few observations below.
Claude Code on behalf of Guillaume Nodet
| ByteArrayOutputStream baos = new ByteArrayOutputStream(); | ||
| inputStream.transferTo(baos); | ||
| byte[] buf = baos.toByteArray(); | ||
| modelId = extractModelId(new ByteArrayInputStream(buf)); |
There was a problem hiding this comment.
When modelId is null and the input comes from a stream/reader, the entire content is buffered into memory (ByteArrayOutputStream / CharArrayWriter) just to extract the GAV. For very large POMs this is fine, but there's a subtlety: if the stream/reader was already partially consumed before reaching here (unlikely but defensive), this would silently lose data.
Also, consider whether the extractModelId parse could share a factory with the main MavenStaxReader parse to avoid creating two XMLStreamReader instances for every POM read.
| } | ||
| } | ||
|
|
||
| static class InputFactoryHolder { |
There was a problem hiding this comment.
The InputFactoryHolder lazy-init pattern is good. However, XMLInputFactory.newFactory() is not guaranteed to be thread-safe across all implementations. Since this is a static final, it will be shared across threads. The properties set here (IS_REPLACING_ENTITY_REFERENCES, IS_COALESCING) are set once at init, so the factory itself is safe — but note that some StAX implementations have thread-safety issues when creating readers from a shared factory. The Woodstox implementation used by Maven should be fine, but it's worth a comment.
Also, note that IS_COALESCING = true may have a minor performance cost since it forces the parser to merge adjacent character events. For the limited extraction done here (just groupId/artifactId/version), it's probably negligible.
| switch (currentElement) { | ||
| case "groupId": | ||
| groupId = text; | ||
| break; |
There was a problem hiding this comment.
The early-exit optimization if (artifactId != null && groupId != null && version != null) only checks direct project-level coordinates. If groupId is inherited from the parent and appears after artifactId+version in the POM, the parser will still continue through the entire <project> element. This is correct behavior (the fallback to parent values happens after the loop), but the early-exit condition could also include parentGroupId/parentVersion to exit sooner in the common case where parent is declared before the child elements:
| break; | |
| if (artifactId != null | |
| && (groupId != null || parentGroupId != null) | |
| && (version != null || parentVersion != null)) { | |
| break; | |
| } |
| assertNotNull(source); | ||
| assertEquals("org.example:child:1", source.getModelId()); | ||
| } | ||
|
|
There was a problem hiding this comment.
Good test coverage for the happy path. Consider also adding a test for InputStream-based reading (not just Reader), since the buffering paths differ (ByteArrayOutputStream vs CharArrayWriter). Also, a test where no modelId can be extracted (e.g., a POM missing <artifactId>) would verify the null-fallback behavior.
gnodet
left a comment
There was a problem hiding this comment.
Good approach — removing the reflection hack is the right call. However, the extractModelId XML parser has a correctness issue that needs to be fixed before this can merge.
Bug: Missing depth tracking in extractModelId
The parser sets currentElement for any groupId/artifactId/version element when inProject is true, regardless of nesting depth. This means elements inside <dependencies>, <plugins>, or other nested structures would be incorrectly treated as the project's GAV.
This is especially problematic for POMs that inherit groupId and/or version from <parent>:
<project>
<parent>
<groupId>org.example</groupId>
<artifactId>parent</artifactId>
<version>1.0</version>
</parent>
<artifactId>child</artifactId>
<!-- groupId and version inherited from parent -->
<dependencies>
<dependency>
<groupId>dep.group</groupId>
<artifactId>dep-art</artifactId>
<version>2.0</version>
</dependency>
</dependencies>
</project>Here the parser would produce dep.group:child:2.0 instead of the correct org.example:child:1.0, because:
- Project-level
groupIdis never set (inherited from parent) - The early termination condition (
groupId != null && artifactId != null && version != null) doesn't trigger - The parser continues into
<dependencies>and picks up the dependency'sgroupIdandversion
Fix: Track depth so only direct children of <project> (depth 1) and <parent> (depth 2) are considered:
int depth = 0;
// ...
if (event == XMLStreamConstants.START_ELEMENT) {
depth++;
String localName = reader.getLocalName();
if (depth == 1 && "project".equals(localName)) {
inProject = true;
} else if (depth == 2 && "parent".equals(localName) && inProject) {
inParent = true;
} else if (inProject && (depth == 2 || (depth == 3 && inParent))
&& ("groupId".equals(localName) || ...)) {
currentElement = localName;
}
} else if (event == XMLStreamConstants.END_ELEMENT) {
// ...
depth--;
}Secondary concern: Double I/O and buffering
For InputStream and Reader inputs, the entire content is buffered into memory (ByteArrayOutputStream/CharArrayWriter) for a pre-parse, then re-read for the main parse. For file-based inputs, the file is opened and parsed twice. Consider whether the modelId extraction could be done in a single pass — e.g., by having MavenStaxReader extract the GAV during its normal parse and providing it to the InputSource constructor. This would avoid both the double parse and the memory overhead.
Please also add a test for the parent-inheritance case described above (POM with dependencies that appear before or without a project-level groupId/version).
Claude Code on behalf of Guillaume Nodet
gnodet
left a comment
There was a problem hiding this comment.
Following up on the review — two conclusions after further discussion:
Depth tracking: must fix before merge
The current extractModelId produces wrong results for POMs that inherit groupId or version from <parent> and have dependencies. The inProject flag stays true inside nested elements like <dependencies>, so dependency GAV coordinates overwrite the project-level ones.
Example: a POM with <parent><groupId>org.example</groupId><version>1.0</version></parent>, <artifactId>child</artifactId>, and a dependency with <groupId>dep.group</groupId><version>2.0</version> produces dep.group:child:2.0 instead of org.example:child:1.0.
The END_ELEMENT block also needs depth--; added after line 224 (after clearing currentElement).
Double I/O / buffering: acceptable as-is
The two-pass approach (buffer stream, pre-parse for GAV, main parse for model) is fine. POM files are small (typically a few KB), and the pre-parse bails early once GAV is found. For file-based inputs, the code already avoids buffering by opening the file twice, which is essentially free thanks to the OS page cache. No changes needed here.
Fully automatic review from Claude Code
| String groupId = null; | ||
| String artifactId = null; | ||
| String version = null; | ||
| String parentGroupId = null; | ||
| String parentVersion = null; | ||
|
|
||
| boolean inProject = false; | ||
| boolean inParent = false; | ||
| String currentElement = null; | ||
|
|
||
| while (reader.hasNext()) { | ||
| int event = reader.next(); | ||
|
|
||
| if (event == XMLStreamConstants.START_ELEMENT) { | ||
| String localName = reader.getLocalName(); | ||
|
|
||
| if ("project".equals(localName)) { | ||
| inProject = true; | ||
| } else if ("parent".equals(localName) && inProject) { | ||
| inParent = true; | ||
| } else if (inProject | ||
| && ("groupId".equals(localName) | ||
| || "artifactId".equals(localName) | ||
| || "version".equals(localName))) { | ||
| currentElement = localName; | ||
| } |
There was a problem hiding this comment.
Replace inProject-based logic with depth tracking to avoid matching GAV elements inside nested structures like <dependencies>:
| String groupId = null; | |
| String artifactId = null; | |
| String version = null; | |
| String parentGroupId = null; | |
| String parentVersion = null; | |
| boolean inProject = false; | |
| boolean inParent = false; | |
| String currentElement = null; | |
| while (reader.hasNext()) { | |
| int event = reader.next(); | |
| if (event == XMLStreamConstants.START_ELEMENT) { | |
| String localName = reader.getLocalName(); | |
| if ("project".equals(localName)) { | |
| inProject = true; | |
| } else if ("parent".equals(localName) && inProject) { | |
| inParent = true; | |
| } else if (inProject | |
| && ("groupId".equals(localName) | |
| || "artifactId".equals(localName) | |
| || "version".equals(localName))) { | |
| currentElement = localName; | |
| } | |
| String groupId = null; | |
| String artifactId = null; | |
| String version = null; | |
| String parentGroupId = null; | |
| String parentVersion = null; | |
| int depth = 0; | |
| boolean inParent = false; | |
| String currentElement = null; | |
| while (reader.hasNext()) { | |
| int event = reader.next(); | |
| if (event == XMLStreamConstants.START_ELEMENT) { | |
| depth++; | |
| String localName = reader.getLocalName(); | |
| if (depth == 2 && "parent".equals(localName)) { | |
| inParent = true; | |
| } else if ((depth == 2 || (depth == 3 && inParent)) | |
| && ("groupId".equals(localName) | |
| || "artifactId".equals(localName) | |
| || "version".equals(localName))) { | |
| currentElement = localName; | |
| } |
Also add depth--; in the END_ELEMENT block (after line 224).
Fully automatic review from Claude Code
Fixes #11974. Tests: mvn -B -am -pl impl/maven-impl test