fix: std.parseYaml wraps single-doc YAML with explicit --- in array by He-Pin · Pull Request #2 · He-Pin/sjsonnet

He-Pin · 2026-06-18T10:26:57Z

Summary

Fix std.parseYaml to wrap single-doc YAML with explicit --- in array, matching go-jsonnet.

Depends on: PR databricks#968 (YAML 1.2 octal fix)

Motivation

go-jsonnet treats YAML with explicit document start marker (---) as multi-document stream, always returning an array. sjsonnet returned single document directly.

Note: jrsonnet 0.5.0-pre99 does NOT wrap single-doc YAML with --- in an array. This aligns sjsonnet with go-jsonnet.

Modification

Added YamlDocStartPattern regex. When detected and composeAll returns single doc, wrap in array.

Result

YAML input	go-jsonnet v0.22.0	jrsonnet 0.5.0-pre99	sjsonnet (before)	sjsonnet (after)
---	[null]	null	null (bug)	[null]
---\na: 1	[{a:1}]	{a:1}	{a:1} (bug)	[{a:1}]
a: 1	{a:1}	{a:1}	{a:1}	{a:1}

Test plan

All tests pass, scalafmt clean

… input (databricks#940) ## Motivation `std.log`, `std.log2`, and `std.log10` produced generic downstream errors for negative and zero inputs (e.g., "not a number" or "overflow") instead of clear, function-specific error messages. go-jsonnet errors with "Not a number" for NaN results from `math.log`. ## Modification - Added NaN checks after `math.log`, `math.log2`, and `math.log10` computations - When the result is NaN, raises "[std.log] Not a number" (etc.) with position info - Added tests for `log(-1)`, `log2(-1)`, `log(0)`, `log2(0)`, and valid inputs ## Result `std.log(-1)` now errors with "[std.log] Not a number" instead of a generic downstream error, providing clearer diagnostics. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.log(-1)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a number (generic) | ERROR: [std.log] Not a number ✅ | | `std.log2(0)` | ERROR: Overflow | ERROR: non-finite | ERROR: [std.log2] overflow | ERROR: [std.log2] overflow ✅ | | `std.log(1)` | `0` | `0` | `0` ✅ | `0` ✅ | | `std.log2(8)` | `3` | `3` | `3` ✅ | `3` ✅ |

…icks#945) ## Motivation `std.regexPartialMatch` returned the full input string in the `string` field instead of the matched substring. For `std.regexPartialMatch("foo", "foobar")`, the `string` field was `"foobar"` instead of `"foo"`. Note: `std.regexPartialMatch` is sjsonnet-specific (not in go-jsonnet or jrsonnet). The expected behavior follows Java's `Matcher.start()`/`Matcher.end()` semantics. ## Modification - Changed `Val.Str(pos.noOffset, str)` to `Val.Str(pos.noOffset, str.substring(matcher.start(), matcher.end()))` in `NativeRegex.scala` - Added tests for partial match at start, middle, and full match ## Result `std.regexPartialMatch("foo", "foobar")` now correctly returns `{string: "foo", captures: [], namedCaptures: {}}` instead of `{string: "foobar", ...}`. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.regexPartialMatch("foo", "foobar")` | N/A (no such function) | N/A | `string: "foobar"` ❌ | `string: "foo"` ✅ | | `std.regexPartialMatch("[0-9]+", "abc123def")` | N/A | N/A | `string: "abc123def"` ❌ | `string: "123"` ✅ | | `std.regexFullMatch("foo", "foo")` | N/A | N/A | `string: "foo"` ✅ | `string: "foo"` ✅ |

…ng original array (databricks#946) ## Motivation `std.removeAt` silently returned the original array when given an invalid index (negative, out-of-bounds, or non-integer), instead of producing a clear error. go-jsonnet crashes with an internal error for negative indices; jrsonnet errors on non-integer indices but silently ignores negative ones. ## Modification - Added bounds checking: negative and out-of-range indices now error with "idx out of bounds" - Added integrality check: non-integer indices error with "idx must be an integer" - Added directional tests for valid and invalid indices ## Result `std.removeAt([1,2,3], -1)` now produces a clear "idx out of bounds" error instead of silently returning the original array. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.removeAt([1,2,3], 1)` | `[1,3]` | `[1,3]` | `[1,3]` ✅ | `[1,3]` ✅ | | `std.removeAt([1,2,3], -1)` | CRASH (internal error) | `[1,2,3]` (wrong) | `[1,2,3]` (wrong) ❌ | ERROR: idx out of bounds ✅ | | `std.removeAt([1,2,3], 1.5)` | ERROR: Expected an integer | ERROR: cannot convert | `[1,2,3]` (wrong) ❌ | ERROR: idx must be an integer ✅ | | `std.removeAt([1,2,3], 10)` | CRASH (internal error) | `[1,2,3]` (wrong) | `[1,2,3]` (wrong) ❌ | ERROR: idx out of bounds ✅ |

…atabricks#948) ## Motivation Object comprehensions always used `Visibility.Normal` regardless of the field separator (`:`, `::`, `:::`), so `{[k]:: 1 for k in ["a"]}` incorrectly exposed the field as visible. go-jsonnet rejects hidden fields in comprehensions entirely; jrsonnet honors the visibility modifier. ## Modification - Parser now passes the visibility modifier through to `ObjComp` expressions instead of hardcoding `Visibility.Normal` - `ObjectScopeFactory` uses the propagated visibility instead of always `Visibility.Normal` - Updated `Expr`, `ExprTransform`, `ScopedExprTransform`, and `Evaluator` to carry the visibility field - Added tests for `::` (hidden), `:::` (forced), and `:` (normal) visibility in comprehensions ## Result `{[k]:: 1 for k in ["a", "b"]}` now correctly hides fields (empty `std.objectFields` result), matching jrsonnet behavior. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.objectFields({[k]:: 1 for k in ["a","b"]})` | ERROR: cannot have hidden fields | `[]` | ParseError ❌ | `[]` ✅ | | `std.objectFields({[k]: 1 for k in ["a","b"]})` | `["a","b"]` | `["a","b"]` | `["a","b"]` ✅ | `["a","b"]` ✅ | > Note: go-jsonnet rejects hidden/forced fields in comprehisons. sjsonnet follows jrsonnet's approach of honoring the visibility modifier.

) ## Motivation The `tryEagerEval` optimization forced evaluation of lazy thunks by calling `binding.value` on scope bindings, violating Jsonnet's lazy evaluation semantics. Unused bindings with side effects (e.g., `error`, `std.trace`) were evaluated even when their results were never needed. ## Modification - Changed `resolveAsDouble` to pattern-match on the binding directly instead of forcing `.value` - Only already-evaluated `Val.Num` bindings are used for eager evaluation - Unevaluated thunks return `Double.NaN` (skip optimization), preserving lazy semantics - Added test verifying unused error bindings are not forced through eager eval paths ## Result `local a = error "should not be evaluated"; local b = a + 1; if false then b else 0` now correctly returns `0` without evaluating `a`, matching go-jsonnet and jrsonnet lazy evaluation semantics. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `local f(x) = x; local y = f(error "boom"); if true then 42 else y` | `42` | `42` | ERROR (forced thunk) ❌ | `42` ✅ | | `local a = error "x"; if false then a + 1 else 0` | `0` | `0` | `0` ✅ | `0` ✅ |

…#950) ## Motivation `std.asin`, `std.acos`, and `std.pow` silently returned `NaN` for out-of-domain inputs. go-jsonnet's `makeDoubleCheck` errors with "not a number" for NaN results from all math functions. ## Modification - Added post-computation NaN checks to `std.asin`, `std.acos`, and `std.pow` - When the result is NaN, raises "[std.asin] not a number" (etc.) with position info - Added tests for error cases: `asin(2)`, `acos(2)`, `pow(-1, 0.5)` and valid inputs - Updated `pow4.jsonnet` golden file to match new error output ## Result Out-of-domain math function calls now error clearly with function-specific context instead of a generic downstream NaN error, matching go-jsonnet's `makeDoubleCheck` behavior. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.asin(2)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a number (generic) | ERROR: [std.asin] not a number ✅ | | `std.acos(2)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a number (generic) | ERROR: [std.acos] not a number ✅ | | `std.pow(-1, 0.5)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a number (generic) | ERROR: [std.pow] not a number ✅ | | `std.asin(0.5)` | `0.5236...` | `0.5236...` | `0.5236...` ✅ | `0.5236...` ✅ |

…cks#952) ## Motivation `std.isEmpty` incorrectly accepted function values (treating zero-arity as empty, multi-arity as non-empty) and the error message referenced "length" instead of "isEmpty". go-jsonnet and jrsonnet reject non-string types with a type error. ## Modification - Removed `Val.Func` case from `isEmpty`, so function inputs now fall through to the error case - Fixed error message from "length operates on strings, objects, and arrays" to "isEmpty operates on strings, objects, and arrays" - Removed unnecessary `.value` call in error path (`Val.value` returns `this`) - Updated `builtinIsEmpty2.jsonnet.golden`, `StdLibOfficialCompatibilityTests`, and `Std0150FunctionsTests` - Added directional tests for valid types and error cases ## Result `std.isEmpty` now rejects function inputs with a clear error message. Non-string types still produce "isEmpty operates on strings, objects, and arrays, got {type}". ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.isEmpty(function(x) x)` | ERROR: expected string | ERROR: expected string | `false` ❌ | ERROR ✅ | | `std.isEmpty(42)` | ERROR: expected string | ERROR: expected string | ERROR: "length operates" ❌ | ERROR: "isEmpty operates" ✅ | | `std.isEmpty("")` | `true` | `true` | `true` ✅ | `true` ✅ | | `std.isEmpty([])` | ERROR: expected string | ERROR: expected string | `true` (extension) | `true` (extension) | > Note: sjsonnet extends `isEmpty` to accept arrays and objects, which go-jsonnet and jrsonnet reject. This is a deliberate sjsonnet extension.

…ge (databricks#954) ## Motivation `mergeMember` threw a raw Scala `MatchError` when `+:` nested field inheritance encountered incompatible type combinations (e.g., `{a: true} + {a+: 1}`), producing an unhelpful internal error instead of a user-friendly message. ## Modification - Replaced `throw new MatchError((l, r))` with `Error.fail("Cannot merge " + l.prettyName + " with " + r.prettyName, pos)` in `Val.scala` - Added directional tests: compatible types (string, number, array) still merge correctly; incompatible types produce descriptive errors ## Result `{a: true} + {a+: 1}` now produces "Cannot merge boolean with number" instead of "Internal error: MatchError", matching go-jsonnet's descriptive error style. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `{a: true} + {a+: 1}` | ERROR: Unexpected type boolean | ERROR: not implemented | Internal error: MatchError ❌ | Cannot merge boolean with number ✅ | | `{a: "x"} + {a+: "y"}` | `{a: "xy"}` | `{a: "xy"}` | `{a: "xy"}` ✅ | `{a: "xy"}` ✅ | | `{a: 1} + {a+: 2}` | `{a: 3}` | `{a: 3}` | `{a: 3}` ✅ | `{a: 3}` ✅ |

…cks#955) ## Motivation NaN handling was inconsistent across four arithmetic evaluation paths: the main evaluator, comprehension fast path, inline optimizer, and double-fast-path. Some paths detected NaN results and errored, while others silently propagated NaN values. ## Modification - Added NaN result checks to `OP_+`, `OP_-`, `OP_*` in all four arithmetic paths - Added NaN checks to `OP_%` and `OP_/` where they were previously missing - All paths now consistently report "not a number" when arithmetic produces NaN (e.g., `Infinity + (-Infinity)`, `0 * Infinity`) - Added directional tests for NaN-producing operations and regular arithmetic ## Result Arithmetic operations that produce NaN now consistently error across all evaluation paths, eliminating the behavioral inconsistency between array comprehension and top-level evaluation. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `local x = std.pow(2, 1024); x + (-x)` | ERROR: Overflow | ERROR: non-finite | Inconsistent (varies by path) ❌ | ERROR: not a number ✅ | | `local x = std.pow(2, 1024); 0 * x` | ERROR: Overflow | ERROR: non-finite | Inconsistent ❌ | ERROR: not a number ✅ | | `1 + 2` | `3` | `3` | `3` ✅ | `3` ✅ |

…ation (databricks#956) ## Motivation `TomlRenderer.visitFloat64` rendered large integers like `1e20` as `"1.0E20"` (scientific notation), which is invalid TOML integer format. go-jsonnet renders `1e20` as `100000000000000000000`. The `Renderer` and `BaseCharRenderer` had `BigDecimal` fallback logic, but `TomlRenderer` was missing it. ## Modification - Added `BigDecimal` fallback between the Long-range check and `Double.toString`: when `math.round(d).toDouble != d` but `d % 1 == 0`, uses `BigDecimal.toBigInt` for exact decimal output - Added directional tests: large integer (1e20), regular integer (42), and fraction (3.14) ## Result `std.manifestToml({a: 1e20})` now renders `a = 100000000000000000000` instead of `a = 1.0E20`, matching go-jsonnet and producing valid TOML output. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.manifestToml({a: 1e20})` | `a = 100000000000000000000` | `a = 100000000000000000000` | `a = 1.0E20` ❌ | `a = 100000000000000000000` ✅ | | `std.manifestToml({a: 42})` | `a = 42` | `a = 42` | `a = 42` ✅ | `a = 42` ✅ | | `std.manifestToml({a: 3.14})` | `a = 3.14` | `a = 3.14` | `a = 3.14` ✅ | `a = 3.14` ✅ |

…ce (databricks#957) ## Motivation `YamlRenderer` block scalar rendering had two bugs: (1) multiple trailing newlines were lost because `Pattern.split` discards trailing empty strings and the code always used `|` (clip) mode; (2) strings with leading whitespace were missing the required YAML indent indicator (e.g., `|2`), causing YAML parsers to misinterpret leading spaces as structural indentation. Additionally, `PrettyYamlRenderer` had an off-by-one bug (`len > 2` instead of `len > 1`) that caused 2-character strings like `"\n\n"` to use clip mode `|` instead of keep mode `|+`. ## Modification - Changed `split(s.toString)` to `split(str, -1)` to preserve trailing empty strings - Added detection of multiple trailing newlines to use `|+` (keep) mode instead of `|` (clip) - Added `blockOffsetNumeral` for leading whitespace indent indicator when first character is a space - Used `appendString(blockStyle)` for the block style string (append only accepts Char/Int) - Fixed `PrettyYamlRenderer` off-by-one: `len > 2` → `len > 1` ## Result YAML block scalars now correctly preserve multiple trailing newlines and include indent indicators for leading whitespace, producing spec-compliant YAML output. ## References | Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet (before) | sjsonnet (after) | |---|---|---|---|---| | `std.manifestYamlDoc({a: "hello\n\n\n"})` | `"a": \|+` with trailing newlines | `"a": \|+` with trailing newlines | `"a": \|` truncated ❌ | `"a": \|+` correct ✅ |

Motivation: SnakeYAML's SafeConstructor uses YAML 1.1 implicit type resolution which does not recognize the 0o prefix for octal integers introduced in YAML 1.2. This caused std.parseYaml to treat unquoted 0o777 as the string "0o777" instead of the integer 511, diverging from go-jsonnet and jrsonnet. Modification: Replaced SafeConstructor-based parsing with composeAll() which gives access to raw YAML nodes with scalar style information. Added yamlNodeToJson() that handles YAML 1.2 octal (0o prefix) for plain (unquoted) scalars while correctly preserving quoted values as strings. Also handles all other YAML scalar types (int, float, bool, null) with full YAML 1.1 compatibility. Result: std.parseYaml now correctly parses both legacy (0777) and modern (0o777) octal syntax for unquoted values, while quoted "0o777" remains a string, matching go-jsonnet and jrsonnet behavior exactly. | YAML input | go-jsonnet v0.22.0 | jrsonnet 0.5.0-pre99 | sjsonnet (before) | sjsonnet (after) | |-----------|-------------------|---------------------|-------------------|-----------------| | 0777 | 511 | 511 | 511 | 511 | | 0o777 | 511 | 511 | "0o777" (bug) | 511 | | 0o10 | 8 | 8 | "0o10" (bug) | 8 | | -0o777 | -511 | -511 | "-0o777" (bug) | -511 | | "0o777" | "0o777" | "0o777" | "0o777" | "0o777" |

Motivation: go-jsonnet treats YAML input containing an explicit document start marker (---) as a multi-document stream, always returning an array even when there is only one document. sjsonnet returned the single document directly, diverging from go-jsonnet for inputs like "---", "---\n", and "---\na: 1". Note: jrsonnet 0.5.0-pre99 does NOT wrap single-doc YAML with --- in an array (returns null for "---"), so this aligns sjsonnet with go-jsonnet's stricter behavior. Modification: Added YamlDocStartPattern regex that detects --- followed by whitespace or end-of-string (per YAML spec). When detected and composeAll returns a single document, the result is wrapped in an array. Updated existing ParseYaml tests and go_test_suite golden file to match go-jsonnet. Result: std.parseYaml now correctly handles YAML document start markers, matching go-jsonnet behavior for all edge cases. | YAML input | go-jsonnet v0.22.0 | jrsonnet 0.5.0-pre99 | sjsonnet (before) | sjsonnet (after) | |-----------|-------------------|---------------------|-------------------|-----------------| | "---" | [null] | null | null (bug) | [null] | | "---\n" | [null] | null | null (bug) | [null] | | "---\na:1"| [{a:1}] | {a:1} | {a:1} (bug) | [{a:1}] | | "--- 3\n" | [3] | 3 | 3 (bug) | [3] | | "a: 1" | {a:1} | {a:1} | {a:1} | {a:1} |

He-Pin added 13 commits June 17, 2026 20:39

He-Pin closed this Jun 18, 2026

He-Pin deleted the fix/parseyaml-doc-marker-v2 branch June 18, 2026 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: std.parseYaml wraps single-doc YAML with explicit --- in array#2

fix: std.parseYaml wraps single-doc YAML with explicit --- in array#2
He-Pin wants to merge 13 commits into
masterfrom
fix/parseyaml-doc-marker-v2

He-Pin commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Jun 18, 2026

Summary

Motivation

Modification

Result

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant