Broaden update_versions.py regex to handle custom XML property tags#49262
Broaden update_versions.py regex to handle custom XML property tags#49262jeet1995 wants to merge 1 commit into
Conversation
…ty tags
The regex previously only matched version values inside <version>...</version>
elements. Custom Maven property tags like <scala-jackson.version>2.18.6
</scala-jackson.version> that carry valid {x-version-update} comments were
silently skipped, causing version drift and bannedDependencies failures
(see PR #49261 for the immediate fix).
The new regex matches the content of any XML element, not just <version>.
The lookahead (?=</[a-zA-Z]) ensures content before XML comments (<!-- -->)
is not matched.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The update_versions.py script's regex only matches <version> XML elements,
silently skipping custom property tags like <scala-jackson.version> despite
valid {x-version-update} comments. When PR #49180 bumped Jackson to 2.18.7,
these properties were left stale, causing the enforcer to ban the correct
2.18.7 dependency.
Bump scala-jackson.version from 2.18.6/2.18.4 to 2.18.7 in all four Spark
parent POMs. The underlying script limitation is tracked in PR #49262.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…49263) * Fix bannedDependencies failure: bump scala-jackson.version to 2.18.7 The update_versions.py script's regex only matches <version> XML elements, silently skipping custom property tags like <scala-jackson.version> despite valid {x-version-update} comments. When PR #49180 bumped Jackson to 2.18.7, these properties were left stale, causing the enforcer to ban the correct 2.18.7 dependency. Bump scala-jackson.version from 2.18.6/2.18.4 to 2.18.7 in all four Spark parent POMs. The underlying script limitation is tracked in PR #49262. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trigger Cosmos CI --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| # Match version content inside any XML element, not just <version>. | ||
| # This handles custom property tags like <scala-jackson.version>2.18.6</scala-jackson.version> | ||
| # that carry {x-version-update} comments but are not wrapped in <version> elements. | ||
| # The lookahead (?=</[a-zA-Z]) ensures we don't match content before XML comments (<!-- -->). | ||
| external_dependency_version_regex = r'(?<=>)[^<]+(?=</[a-zA-Z])' |
There was a problem hiding this comment.
Would it be possible to restrict this slightly to either matching <version>...</version> exactly or a common pattern we use for <properties>...</properties> version definitions? Such as all version declarations in <properties>...</properties> suffix with .version.
The regex then can be something like (?<=<((?:[\w-.]+\.)?version)>).+?(?=<\/\1>), where we look for an XML tag that is either <version> itself or <something-custom.morestuff.version>. This also changes the logic to use the first capture group to match the closing tag so we don't need to double define the capture group.
|
@alzimmermsft — PR was auto-closed when I rebased the branch. The updated fix is in #49267 with the tighter regex you requested: \\python This restricts matches to </version>\ and </something.version>\ closing tags only (rejects </description>, </subversion>, etc.). Python's |
Problem
The
external_dependency_version_regexineng/versioning/utils.pyonly matches version values inside<version>...</version>XML elements:When a POM uses a custom Maven property tag with a valid
{x-version-update}comment, like:The script finds the line (via the
{x-version-update}comment), but there.subon line 119 ofupdate_versions.pyis a silent no-op because the regex lookbehind expects<version>and finds<scala-jackson.version>instead. The version is never updated, no warning is emitted.This caused the
bannedDependenciesfailure fixed in PR #49263.Fix
Broaden the regex to match the content of any XML element, not just
<version>:How the regex works
(?<=>)>>-- works for<version>,<scala-jackson.version>, or any tag[^<]+<characters2.18.6)(?=</[a-zA-Z])</followed by a letter<!-- -->commentsWhy
(?=</[a-zA-Z])and not just(?=</)Lines with
{x-version-update}comments look like:After
</version>, the>is followed by<!-- .... Without the[a-zA-Z]guard, the regex would also match the space between</version>and<!--as a second match (sincere.subreplaces all matches). The[a-zA-Z]ensures we only match content before closing XML tags (</version>,</scala-jackson.version>), never before comment markers (<!--).Local validation
Ran
update_versions.py --sracross all 874 POM files with old vs new regex:<version>only)scala-jackson.versionpropertiesThe 4 changed files are the same Spark POMs manually fixed in PR #49263, with identical diffs (
2.18.4/2.18.6->2.18.7). No other files in the repo are affected.Known limitation
If a
{x-version-update}line contained multiple XML elements (e.g.,<a>1</a><b>2</b>),re.subwould replace both values. This pattern does not exist on any{x-version-update}line in the repo. A future hardening option would be to addcount=1to there.subcall on line 119.Related