Skip to content

Avoid reflective InputSource modelId mutation#12147

Open
Will-thom wants to merge 1 commit into
apache:maven-4.0.xfrom
Will-thom:issue-11974-inputsource-final-warning-4.0
Open

Avoid reflective InputSource modelId mutation#12147
Will-thom wants to merge 1 commit into
apache:maven-4.0.xfrom
Will-thom:issue-11974-inputsource-final-warning-4.0

Conversation

@Will-thom
Copy link
Copy Markdown

@Will-thom Will-thom commented May 24, 2026

Fixes #11974. Tests: mvn -B -am -pl impl/maven-impl test

Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR replaces a reflective Field.setAccessible hack in DefaultModelBuilder.doReadFileModel() with a proper approach: pre-parsing the POM to extract the model ID before constructing the InputSource. This is a great improvement — the reflective mutation of InputSource.modelId was fragile and would break under strict module systems or JDK 26+ integrity checks.

The implementation is solid overall. A few observations below.

Claude Code on behalf of Guillaume Nodet

ByteArrayOutputStream baos = new ByteArrayOutputStream();
inputStream.transferTo(baos);
byte[] buf = baos.toByteArray();
modelId = extractModelId(new ByteArrayInputStream(buf));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When modelId is null and the input comes from a stream/reader, the entire content is buffered into memory (ByteArrayOutputStream / CharArrayWriter) just to extract the GAV. For very large POMs this is fine, but there's a subtlety: if the stream/reader was already partially consumed before reaching here (unlikely but defensive), this would silently lose data.

Also, consider whether the extractModelId parse could share a factory with the main MavenStaxReader parse to avoid creating two XMLStreamReader instances for every POM read.

}
}

static class InputFactoryHolder {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InputFactoryHolder lazy-init pattern is good. However, XMLInputFactory.newFactory() is not guaranteed to be thread-safe across all implementations. Since this is a static final, it will be shared across threads. The properties set here (IS_REPLACING_ENTITY_REFERENCES, IS_COALESCING) are set once at init, so the factory itself is safe — but note that some StAX implementations have thread-safety issues when creating readers from a shared factory. The Woodstox implementation used by Maven should be fine, but it's worth a comment.

Also, note that IS_COALESCING = true may have a minor performance cost since it forces the parser to merge adjacent character events. For the limited extraction done here (just groupId/artifactId/version), it's probably negligible.

switch (currentElement) {
case "groupId":
groupId = text;
break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early-exit optimization if (artifactId != null && groupId != null && version != null) only checks direct project-level coordinates. If groupId is inherited from the parent and appears after artifactId+version in the POM, the parser will still continue through the entire <project> element. This is correct behavior (the fallback to parent values happens after the loop), but the early-exit condition could also include parentGroupId/parentVersion to exit sooner in the common case where parent is declared before the child elements:

Suggested change
break;
if (artifactId != null
&& (groupId != null || parentGroupId != null)
&& (version != null || parentVersion != null)) {
break;
}

assertNotNull(source);
assertEquals("org.example:child:1", source.getModelId());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test coverage for the happy path. Consider also adding a test for InputStream-based reading (not just Reader), since the buffering paths differ (ByteArrayOutputStream vs CharArrayWriter). Also, a test where no modelId can be extracted (e.g., a POM missing <artifactId>) would verify the null-fallback behavior.

Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach — removing the reflection hack is the right call. However, the extractModelId XML parser has a correctness issue that needs to be fixed before this can merge.

Bug: Missing depth tracking in extractModelId

The parser sets currentElement for any groupId/artifactId/version element when inProject is true, regardless of nesting depth. This means elements inside <dependencies>, <plugins>, or other nested structures would be incorrectly treated as the project's GAV.

This is especially problematic for POMs that inherit groupId and/or version from <parent>:

<project>
  <parent>
    <groupId>org.example</groupId>
    <artifactId>parent</artifactId>
    <version>1.0</version>
  </parent>
  <artifactId>child</artifactId>
  <!-- groupId and version inherited from parent -->
  <dependencies>
    <dependency>
      <groupId>dep.group</groupId>
      <artifactId>dep-art</artifactId>
      <version>2.0</version>
    </dependency>
  </dependencies>
</project>

Here the parser would produce dep.group:child:2.0 instead of the correct org.example:child:1.0, because:

  • Project-level groupId is never set (inherited from parent)
  • The early termination condition (groupId != null && artifactId != null && version != null) doesn't trigger
  • The parser continues into <dependencies> and picks up the dependency's groupId and version

Fix: Track depth so only direct children of <project> (depth 1) and <parent> (depth 2) are considered:

int depth = 0;
// ...
if (event == XMLStreamConstants.START_ELEMENT) {
    depth++;
    String localName = reader.getLocalName();
    if (depth == 1 && "project".equals(localName)) {
        inProject = true;
    } else if (depth == 2 && "parent".equals(localName) && inProject) {
        inParent = true;
    } else if (inProject && (depth == 2 || (depth == 3 && inParent))
            && ("groupId".equals(localName) || ...)) {
        currentElement = localName;
    }
} else if (event == XMLStreamConstants.END_ELEMENT) {
    // ...
    depth--;
}

Secondary concern: Double I/O and buffering

For InputStream and Reader inputs, the entire content is buffered into memory (ByteArrayOutputStream/CharArrayWriter) for a pre-parse, then re-read for the main parse. For file-based inputs, the file is opened and parsed twice. Consider whether the modelId extraction could be done in a single pass — e.g., by having MavenStaxReader extract the GAV during its normal parse and providing it to the InputSource constructor. This would avoid both the double parse and the memory overhead.

Please also add a test for the parent-inheritance case described above (POM with dependencies that appear before or without a project-level groupId/version).

Claude Code on behalf of Guillaume Nodet

Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on the review — two conclusions after further discussion:

Depth tracking: must fix before merge

The current extractModelId produces wrong results for POMs that inherit groupId or version from <parent> and have dependencies. The inProject flag stays true inside nested elements like <dependencies>, so dependency GAV coordinates overwrite the project-level ones.

Example: a POM with <parent><groupId>org.example</groupId><version>1.0</version></parent>, <artifactId>child</artifactId>, and a dependency with <groupId>dep.group</groupId><version>2.0</version> produces dep.group:child:2.0 instead of org.example:child:1.0.

The END_ELEMENT block also needs depth--; added after line 224 (after clearing currentElement).

Double I/O / buffering: acceptable as-is

The two-pass approach (buffer stream, pre-parse for GAV, main parse for model) is fine. POM files are small (typically a few KB), and the pre-parse bails early once GAV is found. For file-based inputs, the code already avoids buffering by opening the file twice, which is essentially free thanks to the OS page cache. No changes needed here.

Fully automatic review from Claude Code

Comment on lines +190 to +215
String groupId = null;
String artifactId = null;
String version = null;
String parentGroupId = null;
String parentVersion = null;

boolean inProject = false;
boolean inParent = false;
String currentElement = null;

while (reader.hasNext()) {
int event = reader.next();

if (event == XMLStreamConstants.START_ELEMENT) {
String localName = reader.getLocalName();

if ("project".equals(localName)) {
inProject = true;
} else if ("parent".equals(localName) && inProject) {
inParent = true;
} else if (inProject
&& ("groupId".equals(localName)
|| "artifactId".equals(localName)
|| "version".equals(localName))) {
currentElement = localName;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace inProject-based logic with depth tracking to avoid matching GAV elements inside nested structures like <dependencies>:

Suggested change
String groupId = null;
String artifactId = null;
String version = null;
String parentGroupId = null;
String parentVersion = null;
boolean inProject = false;
boolean inParent = false;
String currentElement = null;
while (reader.hasNext()) {
int event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
String localName = reader.getLocalName();
if ("project".equals(localName)) {
inProject = true;
} else if ("parent".equals(localName) && inProject) {
inParent = true;
} else if (inProject
&& ("groupId".equals(localName)
|| "artifactId".equals(localName)
|| "version".equals(localName))) {
currentElement = localName;
}
String groupId = null;
String artifactId = null;
String version = null;
String parentGroupId = null;
String parentVersion = null;
int depth = 0;
boolean inParent = false;
String currentElement = null;
while (reader.hasNext()) {
int event = reader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
depth++;
String localName = reader.getLocalName();
if (depth == 2 && "parent".equals(localName)) {
inParent = true;
} else if ((depth == 2 || (depth == 3 && inParent))
&& ("groupId".equals(localName)
|| "artifactId".equals(localName)
|| "version".equals(localName))) {
currentElement = localName;
}

Also add depth--; in the END_ELEMENT block (after line 224).

Fully automatic review from Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants