Fix garbled non-ASCII text in Compare with Server by Mosh-K · Pull Request #322 · Power-Maverick/DataverseDevTools-VSCode

Mosh-K · 2026-05-19T11:05:16Z

Problem

Compare with Server shows garbled text in the server pane for any web resource containing non-ASCII characters (Hebrew, Arabic, CJK, accented Latin, etc.). Because every multi-byte character is corrupted into different bytes, the diff editor falsely reports every line containing non-ASCII as changed — even when local and server are identical. This makes the compare feature unusable for any localized web resource.

Before / After

Before — server pane (left) shows double-encoded mojibake; the diff falsely reports every line as changed even though the files are identical:

After — server pane decodes correctly; the diff now reflects real differences only:

Root cause

In src/utils/ExtensionMethods.ts:

export const decodeFromBase64 = (str: string): string =>
    Buffer.from(str, "base64").toString("binary");

The Dataverse Web API returns the web resource content as base64-encoded UTF-8 bytes. Decoding with "binary" (an alias for Latin-1) maps each byte to one char, so a multi-byte UTF-8 sequence like 0xD7 0x90 (א) becomes the two Latin-1 chars × + \x90.

The result is then handed to fs.writeFileSync(path, data) with no encoding, which defaults to "utf8" — re-encoding those Latin-1 chars as UTF-8. Net effect: the original UTF-8 bytes are wrapped inside another layer of UTF-8 encoding — classic double-encoding / mojibake.

ASCII-only resources are unaffected because Latin-1 and UTF-8 agree on bytes 0x00–0x7F, which is why this has gone unnoticed since the file was introduced in 2021.

Fix

One-line change — decode as UTF-8 instead of Latin-1:

-export const decodeFromBase64 = (str: string): string => Buffer.from(str, "base64").toString("binary");
+export const decodeFromBase64 = (str: string): string => Buffer.from(str, "base64").toString("utf8");

Why `encodeToBase64` was intentionally not changed

encodeToBase64 has the mirror "binary" argument, but it is harmless dead-code on every call path in the repo. The only callers are:

encodeToBase64(readFileSync(fullPath))

…and readFileSync (in src/utils/FileSystem.ts) calls fs.readFileSync(source) with no encoding, which returns a Buffer. When the first argument to Buffer.from is already a Buffer, the encoding argument is ignored per Node's docs — the bytes are simply copied. So uploads round-trip correctly today.

Changing encodeToBase64 would be a no-op for current callers and risks silently changing behavior if someone in the future starts passing a string instead of a Buffer. Keeping the PR strictly to the broken decode path.

Test plan

Compared a web resource containing Hebrew before/after the fix — see screenshots above.
Verified an ASCII-only web resource still diffs identically to before (no false diffs, no encoding changes).
Confirmed upload path (uploadWebResource → encodeToBase64(readFileSync(...))) is unchanged and still produces correct base64 for both ASCII and non-ASCII content.

decodeFromBase64 was decoding base64 using "binary" (Latin-1), but the Dataverse Web API returns UTF-8 bytes. Each multi-byte UTF-8 sequence became one Latin-1 char per byte, and writeFileSync then re-encoded those chars as UTF-8 — producing double-encoded mojibake in the server pane of the diff view for any web resource containing non-ASCII content (Hebrew, Arabic, CJK, accented Latin, etc.). Switching the decode to "utf8" preserves the original bytes. ASCII-only resources are unaffected because Latin-1 and UTF-8 agree on 0x00-0x7F, which is why the bug went unnoticed for years. encodeToBase64 is intentionally left untouched: it is only ever called with a Buffer (readFileSync without an encoding returns a Buffer), and Buffer.from ignores the encoding argument when the input is already a Buffer — so the "binary" there is harmless dead-code on that path.

Add decodeFromBase64ToUTF8 alongside the existing decodeFromBase64 and switch the Compare-with-Server path to it. The existing function is kept unchanged for symmetry with encodeToBase64. Root cause: the Dataverse Web API returns web resource content as base64-encoded UTF-8 bytes. decodeFromBase64 decoded with "binary" (Latin-1), turning each multi-byte UTF-8 sequence into one Latin-1 char per byte; fs.writeFileSync then re-encoded those chars as UTF-8 — net result was double-encoded mojibake in the server pane of the diff, making every line containing non-ASCII appear as a false diff. ASCII-only resources were unaffected because Latin-1 and UTF-8 agree on bytes 0x00-0x7F, which is why the bug went unnoticed since 2021.

Mosh-K requested a review from Power-Maverick as a code owner May 19, 2026 11:05

Power-Maverick requested changes May 19, 2026

View reviewed changes

Comment thread src/utils/ExtensionMethods.ts Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix garbled non-ASCII text in Compare with Server#322

Fix garbled non-ASCII text in Compare with Server#322
Mosh-K wants to merge 2 commits into
Power-Maverick:mainfrom
Mosh-K:fix/decode-base64-utf8

Mosh-K commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Mosh-K commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Before / After

Root cause

Fix

Why encodeToBase64 was intentionally not changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mosh-K commented May 19, 2026 •

edited

Loading

Why `encodeToBase64` was intentionally not changed