Skip to content

Broaden update_versions.py regex to handle custom XML property tags#49262

Closed
jeet1995 wants to merge 1 commit into
mainfrom
fix/update-versions-regex-custom-properties
Closed

Broaden update_versions.py regex to handle custom XML property tags#49262
jeet1995 wants to merge 1 commit into
mainfrom
fix/update-versions-regex-custom-properties

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented May 26, 2026

Problem

The external_dependency_version_regex in eng/versioning/utils.py only matches version values inside <version>...</version> XML elements:

external_dependency_version_regex = r'(?<=<version>).+?(?=</version>)'

When a POM uses a custom Maven property tag with a valid {x-version-update} comment, like:

<scala-jackson.version>2.18.6</scala-jackson.version> <!-- {x-version-update;...;external_dependency} -->

The script finds the line (via the {x-version-update} comment), but the re.sub on line 119 of update_versions.py is a silent no-op because the regex lookbehind expects <version> and finds <scala-jackson.version> instead. The version is never updated, no warning is emitted.

This caused the bannedDependencies failure fixed in PR #49263.

Fix

Broaden the regex to match the content of any XML element, not just <version>:

external_dependency_version_regex = r'(?<=>)[^<]+(?=</[a-zA-Z])'

How the regex works

Component Matches Purpose
(?<=>) Lookbehind for > Anchors after the opening tag's > -- works for <version>, <scala-jackson.version>, or any tag
[^<]+ One or more non-< characters Captures the version value (e.g., 2.18.6)
(?=</[a-zA-Z]) Lookahead for </ followed by a letter Ensures we stop at a closing XML tag, not at <!-- --> comments

Why (?=</[a-zA-Z]) and not just (?=</)

Lines with {x-version-update} comments look like:

    <version>2.18.6</version> <!-- {x-version-update;...} -->
                     ^^^^^^^    ^^^^
                     </version> <!-- (comment start)

After </version>, the > is followed by <!-- .... Without the [a-zA-Z] guard, the regex would also match the space between </version> and <!-- as a second match (since re.sub replaces all matches). The [a-zA-Z] ensures we only match content before closing XML tags (</version>, </scala-jackson.version>), never before comment markers (<!--).

Local validation

Ran update_versions.py --sr across all 874 POM files with old vs new regex:

Regex Files processed Files changed Delta
OLD (<version> only) 874 0 --
NEW (any XML element) 874 4 Exactly the 4 stale scala-jackson.version properties

The 4 changed files are the same Spark POMs manually fixed in PR #49263, with identical diffs (2.18.4/2.18.6 -> 2.18.7). No other files in the repo are affected.

Known limitation

If a {x-version-update} line contained multiple XML elements (e.g., <a>1</a><b>2</b>), re.sub would replace both values. This pattern does not exist on any {x-version-update} line in the repo. A future hardening option would be to add count=1 to the re.sub call on line 119.

Related

…ty tags

The regex previously only matched version values inside <version>...</version>
elements. Custom Maven property tags like <scala-jackson.version>2.18.6
</scala-jackson.version> that carry valid {x-version-update} comments were
silently skipped, causing version drift and bannedDependencies failures
(see PR #49261 for the immediate fix).

The new regex matches the content of any XML element, not just <version>.
The lookahead (?=</[a-zA-Z]) ensures content before XML comments (<!-- -->)
is not matched.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jeet1995 added a commit that referenced this pull request May 26, 2026
The update_versions.py script's regex only matches <version> XML elements,
silently skipping custom property tags like <scala-jackson.version> despite
valid {x-version-update} comments. When PR #49180 bumped Jackson to 2.18.7,
these properties were left stale, causing the enforcer to ban the correct
2.18.7 dependency.

Bump scala-jackson.version from 2.18.6/2.18.4 to 2.18.7 in all four Spark
parent POMs. The underlying script limitation is tracked in PR #49262.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jeet1995 added a commit that referenced this pull request May 26, 2026
…49263)

* Fix bannedDependencies failure: bump scala-jackson.version to 2.18.7

The update_versions.py script's regex only matches <version> XML elements,
silently skipping custom property tags like <scala-jackson.version> despite
valid {x-version-update} comments. When PR #49180 bumped Jackson to 2.18.7,
these properties were left stale, causing the enforcer to ban the correct
2.18.7 dependency.

Bump scala-jackson.version from 2.18.6/2.18.4 to 2.18.7 in all four Spark
parent POMs. The underlying script limitation is tracked in PR #49262.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trigger Cosmos CI

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread eng/versioning/utils.py
Comment on lines +30 to +34
# Match version content inside any XML element, not just <version>.
# This handles custom property tags like <scala-jackson.version>2.18.6</scala-jackson.version>
# that carry {x-version-update} comments but are not wrapped in <version> elements.
# The lookahead (?=</[a-zA-Z]) ensures we don't match content before XML comments (<!-- -->).
external_dependency_version_regex = r'(?<=>)[^<]+(?=</[a-zA-Z])'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to restrict this slightly to either matching <version>...</version> exactly or a common pattern we use for <properties>...</properties> version definitions? Such as all version declarations in <properties>...</properties> suffix with .version.

The regex then can be something like (?<=<((?:[\w-.]+\.)?version)>).+?(?=<\/\1>), where we look for an XML tag that is either <version> itself or <something-custom.morestuff.version>. This also changes the logic to use the first capture group to match the closing tag so we don't need to double define the capture group.

@jeet1995 jeet1995 closed this May 26, 2026
@jeet1995 jeet1995 deleted the fix/update-versions-regex-custom-properties branch May 26, 2026 19:46
@jeet1995
Copy link
Copy Markdown
Member Author

@alzimmermsft — PR was auto-closed when I rebased the branch. The updated fix is in #49267 with the tighter regex you requested:

\\python
r'(?<=>)[^<]+(?=</(?:[\w.-]+.)?version>)'
\\

This restricts matches to </version>\ and </something.version>\ closing tags only (rejects </description>, </subversion>, etc.). Python's
e\ module doesn't support variable-length lookbehinds, so the restriction is placed entirely in the lookahead. Full details and validation table in the new PR description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants