Skip to content

Bump chardet from 5.2.0 to 7.2.0#1653

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.2.0
Open

Bump chardet from 5.2.0 to 7.2.0#1653
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.2.0

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Mar 18, 2026

Bumps chardet from 5.2.0 to 7.2.0.

Release notes

Sourced from chardet's releases.

chardet 7.2.0

Features

  • Added include_encodings and exclude_encodings parameters to detect(), detect_all(), and UniversalDetector — restrict or exclude specific encodings from the candidate set, with corresponding -i/--include-encodings and -x/--exclude-encodings CLI flags (#343)
  • Added no_match_encoding (default "cp1252") and empty_input_encoding (default "utf-8") parameters — control which encoding is returned when no candidate survives the pipeline or the input is empty, with corresponding CLI flags (#343)
  • Added -l/--language flag to chardetect CLI — shows the detected language (ISO 639-1 code and English name) alongside the encoding (#342)

Fixes

  • Fixed null-separated ASCII data being misdetected as UTF-16-BE (#346, #347)

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

chardet 7.1.0

Features

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (#249)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (#341)

Fixes

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (#332)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (#333)
  • Fixed undocumented encoding name changes between chardet 5.x and 7.0 — detect() now returns chardet 5.x-compatible names by default (#338)
  • Improved ISO-2022-JP family detection — recognizes ESC sequences for ISO-2022-JP-2004 (JIS X 0213) and ISO-2022-JP-EXT (JIS X 0201 Kana)
  • Fixed silent truncation of corrupt model data (iter_unpack yielded fewer tuples instead of raising)
  • Fixed incorrect date in LICENSE

Performance

  • 5.5x faster first-detect time (~0.42s → ~0.075s) by computing model norms as a side-product of load_models()
  • ~40% faster model parsing via struct.iter_unpack for bulk entry extraction (eliminates ~305K individual unpack calls)

New API parameters

  • Added compat_names parameter (default True) to detect(), detect_all(), and UniversalDetector — set to False to get raw Python codec names instead of chardet 5.x/6.x compatible display names
  • Added prefer_superset parameter (default False) — remaps legacy ISO/subset encodings to their modern Windows/CP superset equivalents (e.g., ASCII → Windows-1252, ISO-8859-1 → Windows-1252). This will default to True in the next major version (8.0).
  • Deprecated should_rename_legacy in favor of prefer_superset — a deprecation warning is emitted when used

Improvements

  • Switched internal canonical encoding names to Python codec names (e.g., "utf-8" instead of "UTF-8"), with compat_names controlling the public output format
  • Added lookup_encoding() to registry for case-insensitive resolution of arbitrary encoding name input to canonical names
  • Achieved 100% line coverage across all source modules (+31 tests)
  • Updated benchmark numbers: 98.2% encoding accuracy, 95.2% language accuracy on 2,510 test files
  • Pinned test-data cloning to chardet release version tags for reproducible builds

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.0.1

... (truncated)

Changelog

Sourced from chardet's changelog.

7.2.0 (2026-03-17)

Features:

  • Added include_encodings and exclude_encodings parameters to :func:~chardet.detect, :func:~chardet.detect_all, and :class:~chardet.UniversalDetector — restrict or exclude specific encodings from the candidate set, with corresponding -i/--include-encodings and -x/--exclude-encodings CLI flags (Dan Blanchard <https://github.com/dan-blanchard>, [#343](https://github.com/chardet/chardet/issues/343) <https://github.com/chardet/chardet/pull/343>)
  • Added no_match_encoding (default "cp1252") and empty_input_encoding (default "utf-8") parameters — control which encoding is returned when no candidate survives the pipeline or the input is empty, with corresponding CLI flags (Dan Blanchard <https://github.com/dan-blanchard>, [#343](https://github.com/chardet/chardet/issues/343) <https://github.com/chardet/chardet/pull/343>)
  • Added -l/--language flag to chardetect CLI — shows the detected language (ISO 639-1 code and English name) alongside the encoding (Dan Blanchard <https://github.com/dan-blanchard>, [#342](https://github.com/chardet/chardet/issues/342) <https://github.com/chardet/chardet/pull/342>)

7.1.0 (2026-03-11)

Features:

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (Dan Blanchard <https://github.com/dan-blanchard>, [#249](https://github.com/chardet/chardet/issues/249) <https://github.com/chardet/chardet/issues/249>)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (Dan Blanchard <https://github.com/dan-blanchard>, [#341](https://github.com/chardet/chardet/issues/341) <https://github.com/chardet/chardet/issues/341>)

Fixes:

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (Dan Blanchard <https://github.com/dan-blanchard>, [#332](https://github.com/chardet/chardet/issues/332) <https://github.com/chardet/chardet/issues/332>, [#335](https://github.com/chardet/chardet/issues/335) <https://github.com/chardet/chardet/pull/335>_)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (Dan Blanchard <https://github.com/dan-blanchard>_,

... (truncated)

Commits
  • 884996a docs: set 7.2.0 release date to 2026-03-17
  • 64361f8 docs: add CLI examples for --no-match-encoding and --empty-input-encoding
  • 89a9a4c Fix null-separated ASCII misdetected as UTF-16-BE (#347)
  • a98f097 docs: add example output to all CLI commands for consistency
  • d2f4ac2 docs: document 7.2.0 features (encoding filters, --language CLI flag)
  • 575fa96 test: add include_encodings accuracy preservation tests
  • 66e21fc fix: strengthen weak tests and remove duplicates
  • e1428c3 Add include/exclude encoding filters (#343)
  • 63e90b5 fix: pass --no-binary chardet for pinned versions with --pure
  • 2fe8993 fix: handle missing ISO_TO_LANGUAGE in older chardet versions
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.2.0.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.2.0)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.2.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 18, 2026
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.72%. Comparing base (b98e44b) to head (c5a4123).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1653      +/-   ##
==========================================
- Coverage   54.74%   54.72%   -0.02%     
==========================================
  Files         335      335              
  Lines       27400    27400              
==========================================
- Hits        15000    14996       -4     
- Misses      12400    12404       +4     
Flag Coverage Δ
functionaltests 0.00% <ø> (ø)
unittests 54.72% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Development

Successfully merging this pull request may close these issues.

0 participants