Skip to content

deps: bump pdf_oxide from 0.3.45 to 0.3.51#37

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/cargo/pdf_oxide-0.3.51
Open

deps: bump pdf_oxide from 0.3.45 to 0.3.51#37
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/cargo/pdf_oxide-0.3.51

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot Bot commented on behalf of github May 19, 2026

Bumps pdf_oxide from 0.3.45 to 0.3.51.

Release notes

Sourced from pdf_oxide's releases.

v0.3.48 | Pluggable cryptographic provider — FIPS 140-3 compliance for

This release lands the office converter integration (#159): bidirectional PDF ↔ DOCX/PPTX/XLSX round-trip with layout-preserving fidelity, exposed through all seven bindings (Rust, Python, Node, WASM, C FFI, C#, Go). Typical text-heavy PDFs round-trip through an Office file and back at near-pixel parity to the source. The corpus harness used to validate the integration covers 26 PDFs spanning academic papers, hymnals, multi-column newspapers, slide decks, government forms, and policy documents.

Closes the v0.3.14-milestone feature request "PDF to Word/DOCX export": text styling (fonts / sizes / colours) preserved via layout-mode writers + Unicode/CJK system-font fallback; paragraphs / headings / lists preserved via positional frame anchors; image placement preserved via raster Image XObject + Form XObject rasterization. Tables flow through positional shapes (grid-aware reconstruction is still follow-up work).

Added

  • Bidirectional PDF ↔ DOCX/PPTX/XLSX conversion (#159) — new OfficeConverter API converts in both directions across DOCX, PPTX, and XLSX. Layout-preserving writers (src/converters/{docx,pptx,xlsx}_layout.rs) emit one positionally- anchored shape / frame per PDF text span; the back-direction render path (render_positional_ir / render_pptx_positional) reproduces the source page near-identically. Available on every binding via the 09-new-features/office_conversion/ examples.

  • Unicode + CJK system-font fallback for office round-trip (src/fonts/unicode_fallback.rs) — when the source PDF embeds a CID- only font subset the writer can't re-embed, a system Unicode face (DejaVu Sans → FreeSans → Noto Sans → Tinos / Arimo) and a CJK face (DroidSansFallbackFull → IPAGothic → NanumGothic → Unifont) are registered automatically. needs_unicode_fallback is WinAnsi-aware (curly quotes / em-en dashes / bullet / ellipsis / trademark stay on the source font); CJK ranges (Han / Hiragana / Katakana / Hangul / Compatibility Forms / Halfwidth–Fullwidth) route to the CJK face first. Restores Hebrew, Arabic, Latin Extended, Chinese, Japanese, and Korean characters that previously rendered as ? glyphs across all three formats.

  • Music-notation region detection + rasterization (src/converters/music_region_finder.rs) — hymnals and sheet-music PDFs (Finale Maestro, SMuFL Bravura, Sibelius Petrucci / Opus, Adobe Sonata, LilyPond Emmentaler, …) are detected by combining a music- font allowlist with a 5-line staff-clustering pass on extract_paths. Detected music systems are rasterized once at

... (truncated)

Changelog

Sourced from pdf_oxide's changelog.

[0.3.51] - 2026-05-17

Comprehensive auto extraction — per-page text-vs-OCR with typed reason codes, graceful native fallback, and image-table recovery — across all seven bindings plus the CLI and MCP server; a pre-merge release-pipeline dry-run; and five bundled fixes.

Added

  • Comprehensive auto extraction (#517) — a new, strictly additive surface that returns recoverable text decided per page/region with a machine-readable reason for every degraded result, and a graceful warn-and-fall-back-to-native policy (never a crash, never a silent empty). The classifier consumes pdf_oxide internals (Tr render-mode-3, GlyphlessFont/no-embedded ratio, notdef/U+FFFD, union of CTM-transformed image boxes, image codec, structure tree, producer/XMP) — strictly more accurate than a post-hoc heuristic on the flattened text. New: a configured-once AutoExtractor (new/text_only/with + fast/balanced/ high_fidelity presets + builder), extract_text/extract_markdown/ extract_html/extract_page/extract_document, the cheap classify_page/classify_document preflight (+ pages_needing_ocr), a one-shot PdfDocument::extract_text_auto, an enriched T0.5 text-quality gate (U+FFFD ratio + critical-fragmentation hard-trigger
    • a column-scramble/consecutive-repeat detector), an optional force_ocr_pages per-page OCR override, and build-time AutoExtractor::prefetch_models() / model_manifest() (the pdf-oxide models prefetch/manifest Dockerfile contract). Exposed across all seven bindings (Rust, C-ABI, Python, WASM, Node, C#, Go cgo+purego — Go via idiomatic functional options) as a frozen JSON envelope, plus CLI subcommands classify/auto/models and MCP tools classify/auto. Existing extract_text/CLI/MCP behaviour is byte-identical.
  • AutoExtractor semantics are precisely specified: TextOnly returns native text without classifying (the cheapest path); each per-page result reports its actual source/reason, so a native fallback after a failed/empty/absent OCR is Fallback + OcrRequestedButUnavailable — never mislabelled Ocr; classify_page/classify_document fail closed on encrypted-unauthenticated PDFs (a security op) while non-security per-page errors degrade gracefully; the "OCR unavailable" warning is emitted only when the ocr feature is absent; and model_cache_dir() resolves cross-platform (Windows %LOCALAPPDATA%/%USERPROFILE%, else $XDG_CACHE_HOME or $HOME/.cache; dependency-free).
  • The local-CPU tier ships via the existing ONNX OCR engine + spatial table detector; the SLANet + PP-DocLayout-S ONNX models are a documented zero-API-change point-release follow-up (tier-model-strategy.md §5) — the API, prefetch and manifest

... (truncated)

Commits
  • 6ecb4aa release: v0.3.51 — comprehensive auto extraction (typed reasons, graceful fal...
  • d1d35a3 release: v0.3.50 — True destructive PDF redaction, PAdES-B-T/B-LT LTV signatu...
  • 18ad69e release: v0.3.49 — off-byte-0 header recovery, sparse-trailer Catalog discove...
  • a17d82b release: v0.3.48 — office converter integration (closes #159) (#507)
  • 7c9e338 release: v0.3.47 — text-extraction quality, CJK + RTL fixes, table-detection ...
  • abfb85b chore: cargo fmt + clippy fixes; address copilot review
  • 1c198ad docs(changelog): note prose / TOC / underline false-positive table fix
  • ad48580 chore(ci): bump taiki-e/install-action from 2.75.22 to 2.77.6
  • 1c89a3a chore(ci): bump EmbarkStudios/cargo-deny-action from 2.0.17 to 2.0.18
  • b3f4e54 chore(ci): bump github/codeql-action from 4.35.3 to 4.35.4
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [pdf_oxide](https://github.com/yfedoseev/pdf_oxide) from 0.3.45 to 0.3.51.
- [Release notes](https://github.com/yfedoseev/pdf_oxide/releases)
- [Changelog](https://github.com/yfedoseev/pdf_oxide/blob/main/CHANGELOG.md)
- [Commits](yfedoseev/pdf_oxide@v0.3.45...v0.3.51)

---
updated-dependencies:
- dependency-name: pdf_oxide
  dependency-version: 0.3.51
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Author

dependabot Bot commented on behalf of github May 19, 2026

Labels

The following labels could not be found: dependencies. Please create it before Dependabot can add it to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants