Skip to content

fix: std.parseYaml wraps single-doc YAML with explicit --- in array#2

Closed
He-Pin wants to merge 13 commits into
masterfrom
fix/parseyaml-doc-marker-v2
Closed

fix: std.parseYaml wraps single-doc YAML with explicit --- in array#2
He-Pin wants to merge 13 commits into
masterfrom
fix/parseyaml-doc-marker-v2

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Summary

Fix std.parseYaml to wrap single-doc YAML with explicit --- in array, matching go-jsonnet.

Depends on: PR databricks#968 (YAML 1.2 octal fix)

Motivation

go-jsonnet treats YAML with explicit document start marker (---) as multi-document stream, always returning an array. sjsonnet returned single document directly.

Note: jrsonnet 0.5.0-pre99 does NOT wrap single-doc YAML with --- in an array. This aligns sjsonnet with go-jsonnet.

Modification

Added YamlDocStartPattern regex. When detected and composeAll returns single doc, wrap in array.

Result

YAML input go-jsonnet v0.22.0 jrsonnet 0.5.0-pre99 sjsonnet (before) sjsonnet (after)
--- [null] null null (bug) [null]
---\na: 1 [{a:1}] {a:1} {a:1} (bug) [{a:1}]
a: 1 {a:1} {a:1} {a:1} {a:1}

Test plan

  • All tests pass, scalafmt clean

He-Pin added 13 commits June 17, 2026 20:39
… input (databricks#940)

## Motivation

`std.log`, `std.log2`, and `std.log10` produced generic downstream
errors for negative and zero inputs (e.g., "not a number" or "overflow")
instead of clear, function-specific error messages. go-jsonnet errors
with "Not a number" for NaN results from `math.log`.

## Modification

- Added NaN checks after `math.log`, `math.log2`, and `math.log10`
computations
- When the result is NaN, raises "[std.log] Not a number" (etc.) with
position info
- Added tests for `log(-1)`, `log2(-1)`, `log(0)`, `log2(0)`, and valid
inputs

## Result

`std.log(-1)` now errors with "[std.log] Not a number" instead of a
generic downstream error, providing clearer diagnostics.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.log(-1)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a
number (generic) | ERROR: [std.log] Not a number ✅ |
| `std.log2(0)` | ERROR: Overflow | ERROR: non-finite | ERROR:
[std.log2] overflow | ERROR: [std.log2] overflow ✅ |
| `std.log(1)` | `0` | `0` | `0` ✅ | `0` ✅ |
| `std.log2(8)` | `3` | `3` | `3` ✅ | `3` ✅ |
…icks#945)

## Motivation

`std.regexPartialMatch` returned the full input string in the `string`
field instead of the matched substring. For
`std.regexPartialMatch("foo", "foobar")`, the `string` field was
`"foobar"` instead of `"foo"`.

Note: `std.regexPartialMatch` is sjsonnet-specific (not in go-jsonnet or
jrsonnet). The expected behavior follows Java's
`Matcher.start()`/`Matcher.end()` semantics.

## Modification

- Changed `Val.Str(pos.noOffset, str)` to `Val.Str(pos.noOffset,
str.substring(matcher.start(), matcher.end()))` in `NativeRegex.scala`
- Added tests for partial match at start, middle, and full match

## Result

`std.regexPartialMatch("foo", "foobar")` now correctly returns `{string:
"foo", captures: [], namedCaptures: {}}` instead of `{string: "foobar",
...}`.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.regexPartialMatch("foo", "foobar")` | N/A (no such function) |
N/A | `string: "foobar"` ❌ | `string: "foo"` ✅ |
| `std.regexPartialMatch("[0-9]+", "abc123def")` | N/A | N/A | `string:
"abc123def"` ❌ | `string: "123"` ✅ |
| `std.regexFullMatch("foo", "foo")` | N/A | N/A | `string: "foo"` ✅ |
`string: "foo"` ✅ |
…ng original array (databricks#946)

## Motivation

`std.removeAt` silently returned the original array when given an
invalid index (negative, out-of-bounds, or non-integer), instead of
producing a clear error. go-jsonnet crashes with an internal error for
negative indices; jrsonnet errors on non-integer indices but silently
ignores negative ones.

## Modification

- Added bounds checking: negative and out-of-range indices now error
with "idx out of bounds"
- Added integrality check: non-integer indices error with "idx must be
an integer"
- Added directional tests for valid and invalid indices

## Result

`std.removeAt([1,2,3], -1)` now produces a clear "idx out of bounds"
error instead of silently returning the original array.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.removeAt([1,2,3], 1)` | `[1,3]` | `[1,3]` | `[1,3]` ✅ | `[1,3]` ✅
|
| `std.removeAt([1,2,3], -1)` | CRASH (internal error) | `[1,2,3]`
(wrong) | `[1,2,3]` (wrong) ❌ | ERROR: idx out of bounds ✅ |
| `std.removeAt([1,2,3], 1.5)` | ERROR: Expected an integer | ERROR:
cannot convert | `[1,2,3]` (wrong) ❌ | ERROR: idx must be an integer ✅ |
| `std.removeAt([1,2,3], 10)` | CRASH (internal error) | `[1,2,3]`
(wrong) | `[1,2,3]` (wrong) ❌ | ERROR: idx out of bounds ✅ |
…atabricks#948)

## Motivation

Object comprehensions always used `Visibility.Normal` regardless of the
field separator (`:`, `::`, `:::`), so `{[k]:: 1 for k in ["a"]}`
incorrectly exposed the field as visible. go-jsonnet rejects hidden
fields in comprehensions entirely; jrsonnet honors the visibility
modifier.

## Modification

- Parser now passes the visibility modifier through to `ObjComp`
expressions instead of hardcoding `Visibility.Normal`
- `ObjectScopeFactory` uses the propagated visibility instead of always
`Visibility.Normal`
- Updated `Expr`, `ExprTransform`, `ScopedExprTransform`, and
`Evaluator` to carry the visibility field
- Added tests for `::` (hidden), `:::` (forced), and `:` (normal)
visibility in comprehensions

## Result

`{[k]:: 1 for k in ["a", "b"]}` now correctly hides fields (empty
`std.objectFields` result), matching jrsonnet behavior.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.objectFields({[k]:: 1 for k in ["a","b"]})` | ERROR: cannot have
hidden fields | `[]` | ParseError ❌ | `[]` ✅ |
| `std.objectFields({[k]: 1 for k in ["a","b"]})` | `["a","b"]` |
`["a","b"]` | `["a","b"]` ✅ | `["a","b"]` ✅ |

> Note: go-jsonnet rejects hidden/forced fields in comprehisons.
sjsonnet follows jrsonnet's approach of honoring the visibility
modifier.
)

## Motivation

The `tryEagerEval` optimization forced evaluation of lazy thunks by
calling `binding.value` on scope bindings, violating Jsonnet's lazy
evaluation semantics. Unused bindings with side effects (e.g., `error`,
`std.trace`) were evaluated even when their results were never needed.

## Modification

- Changed `resolveAsDouble` to pattern-match on the binding directly
instead of forcing `.value`
- Only already-evaluated `Val.Num` bindings are used for eager
evaluation
- Unevaluated thunks return `Double.NaN` (skip optimization), preserving
lazy semantics
- Added test verifying unused error bindings are not forced through
eager eval paths

## Result

`local a = error "should not be evaluated"; local b = a + 1; if false
then b else 0` now correctly returns `0` without evaluating `a`,
matching go-jsonnet and jrsonnet lazy evaluation semantics.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `local f(x) = x; local y = f(error "boom"); if true then 42 else y` |
`42` | `42` | ERROR (forced thunk) ❌ | `42` ✅ |
| `local a = error "x"; if false then a + 1 else 0` | `0` | `0` | `0` ✅
| `0` ✅ |
…#950)

## Motivation

`std.asin`, `std.acos`, and `std.pow` silently returned `NaN` for
out-of-domain inputs. go-jsonnet's `makeDoubleCheck` errors with "not a
number" for NaN results from all math functions.

## Modification

- Added post-computation NaN checks to `std.asin`, `std.acos`, and
`std.pow`
- When the result is NaN, raises "[std.asin] not a number" (etc.) with
position info
- Added tests for error cases: `asin(2)`, `acos(2)`, `pow(-1, 0.5)` and
valid inputs
- Updated `pow4.jsonnet` golden file to match new error output

## Result

Out-of-domain math function calls now error clearly with
function-specific context instead of a generic downstream NaN error,
matching go-jsonnet's `makeDoubleCheck` behavior.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.asin(2)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a
number (generic) | ERROR: [std.asin] not a number ✅ |
| `std.acos(2)` | ERROR: Not a number | ERROR: non-finite | ERROR: not a
number (generic) | ERROR: [std.acos] not a number ✅ |
| `std.pow(-1, 0.5)` | ERROR: Not a number | ERROR: non-finite | ERROR:
not a number (generic) | ERROR: [std.pow] not a number ✅ |
| `std.asin(0.5)` | `0.5236...` | `0.5236...` | `0.5236...` ✅ |
`0.5236...` ✅ |
…cks#952)

## Motivation

`std.isEmpty` incorrectly accepted function values (treating zero-arity
as empty, multi-arity as non-empty) and the error message referenced
"length" instead of "isEmpty". go-jsonnet and jrsonnet reject non-string
types with a type error.

## Modification

- Removed `Val.Func` case from `isEmpty`, so function inputs now fall
through to the error case
- Fixed error message from "length operates on strings, objects, and
arrays" to "isEmpty operates on strings, objects, and arrays"
- Removed unnecessary `.value` call in error path (`Val.value` returns
`this`)
- Updated `builtinIsEmpty2.jsonnet.golden`,
`StdLibOfficialCompatibilityTests`, and `Std0150FunctionsTests`
- Added directional tests for valid types and error cases

## Result

`std.isEmpty` now rejects function inputs with a clear error message.
Non-string types still produce "isEmpty operates on strings, objects,
and arrays, got {type}".

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.isEmpty(function(x) x)` | ERROR: expected string | ERROR:
expected string | `false` ❌ | ERROR ✅ |
| `std.isEmpty(42)` | ERROR: expected string | ERROR: expected string |
ERROR: "length operates" ❌ | ERROR: "isEmpty operates" ✅ |
| `std.isEmpty("")` | `true` | `true` | `true` ✅ | `true` ✅ |
| `std.isEmpty([])` | ERROR: expected string | ERROR: expected string |
`true` (extension) | `true` (extension) |

> Note: sjsonnet extends `isEmpty` to accept arrays and objects, which
go-jsonnet and jrsonnet reject. This is a deliberate sjsonnet extension.
…ge (databricks#954)

## Motivation

`mergeMember` threw a raw Scala `MatchError` when `+:` nested field
inheritance encountered incompatible type combinations (e.g., `{a: true}
+ {a+: 1}`), producing an unhelpful internal error instead of a
user-friendly message.

## Modification

- Replaced `throw new MatchError((l, r))` with `Error.fail("Cannot merge
" + l.prettyName + " with " + r.prettyName, pos)` in `Val.scala`
- Added directional tests: compatible types (string, number, array)
still merge correctly; incompatible types produce descriptive errors

## Result

`{a: true} + {a+: 1}` now produces "Cannot merge boolean with number"
instead of "Internal error: MatchError", matching go-jsonnet's
descriptive error style.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `{a: true} + {a+: 1}` | ERROR: Unexpected type boolean | ERROR: not
implemented | Internal error: MatchError ❌ | Cannot merge boolean with
number ✅ |
| `{a: "x"} + {a+: "y"}` | `{a: "xy"}` | `{a: "xy"}` | `{a: "xy"}` ✅ |
`{a: "xy"}` ✅ |
| `{a: 1} + {a+: 2}` | `{a: 3}` | `{a: 3}` | `{a: 3}` ✅ | `{a: 3}` ✅ |
…cks#955)

## Motivation

NaN handling was inconsistent across four arithmetic evaluation paths:
the main evaluator, comprehension fast path, inline optimizer, and
double-fast-path. Some paths detected NaN results and errored, while
others silently propagated NaN values.

## Modification

- Added NaN result checks to `OP_+`, `OP_-`, `OP_*` in all four
arithmetic paths
- Added NaN checks to `OP_%` and `OP_/` where they were previously
missing
- All paths now consistently report "not a number" when arithmetic
produces NaN (e.g., `Infinity + (-Infinity)`, `0 * Infinity`)
- Added directional tests for NaN-producing operations and regular
arithmetic

## Result

Arithmetic operations that produce NaN now consistently error across all
evaluation paths, eliminating the behavioral inconsistency between array
comprehension and top-level evaluation.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `local x = std.pow(2, 1024); x + (-x)` | ERROR: Overflow | ERROR:
non-finite | Inconsistent (varies by path) ❌ | ERROR: not a number ✅ |
| `local x = std.pow(2, 1024); 0 * x` | ERROR: Overflow | ERROR:
non-finite | Inconsistent ❌ | ERROR: not a number ✅ |
| `1 + 2` | `3` | `3` | `3` ✅ | `3` ✅ |
…ation (databricks#956)

## Motivation

`TomlRenderer.visitFloat64` rendered large integers like `1e20` as
`"1.0E20"` (scientific notation), which is invalid TOML integer format.
go-jsonnet renders `1e20` as `100000000000000000000`. The `Renderer` and
`BaseCharRenderer` had `BigDecimal` fallback logic, but `TomlRenderer`
was missing it.

## Modification

- Added `BigDecimal` fallback between the Long-range check and
`Double.toString`: when `math.round(d).toDouble != d` but `d % 1 == 0`,
uses `BigDecimal.toBigInt` for exact decimal output
- Added directional tests: large integer (1e20), regular integer (42),
and fraction (3.14)

## Result

`std.manifestToml({a: 1e20})` now renders `a = 100000000000000000000`
instead of `a = 1.0E20`, matching go-jsonnet and producing valid TOML
output.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.manifestToml({a: 1e20})` | `a = 100000000000000000000` | `a =
100000000000000000000` | `a = 1.0E20` ❌ | `a = 100000000000000000000` ✅
|
| `std.manifestToml({a: 42})` | `a = 42` | `a = 42` | `a = 42` ✅ | `a =
42` ✅ |
| `std.manifestToml({a: 3.14})` | `a = 3.14` | `a = 3.14` | `a = 3.14` ✅
| `a = 3.14` ✅ |
…ce (databricks#957)

## Motivation

`YamlRenderer` block scalar rendering had two bugs: (1) multiple
trailing newlines were lost because `Pattern.split` discards trailing
empty strings and the code always used `|` (clip) mode; (2) strings with
leading whitespace were missing the required YAML indent indicator
(e.g., `|2`), causing YAML parsers to misinterpret leading spaces as
structural indentation.

Additionally, `PrettyYamlRenderer` had an off-by-one bug (`len > 2`
instead of `len > 1`) that caused 2-character strings like `"\n\n"` to
use clip mode `|` instead of keep mode `|+`.

## Modification

- Changed `split(s.toString)` to `split(str, -1)` to preserve trailing
empty strings
- Added detection of multiple trailing newlines to use `|+` (keep) mode
instead of `|` (clip)
- Added `blockOffsetNumeral` for leading whitespace indent indicator
when first character is a space
- Used `appendString(blockStyle)` for the block style string (append
only accepts Char/Int)
- Fixed `PrettyYamlRenderer` off-by-one: `len > 2` → `len > 1`

## Result

YAML block scalars now correctly preserve multiple trailing newlines and
include indent indicators for leading whitespace, producing
spec-compliant YAML output.

## References

| Expression | go-jsonnet v0.22.0 | jrsonnet v0.5.0-pre98 | sjsonnet
(before) | sjsonnet (after) |
|---|---|---|---|---|
| `std.manifestYamlDoc({a: "hello\n\n\n"})` | `"a": \|+` with trailing
newlines | `"a": \|+` with trailing newlines | `"a": \|` truncated ❌ |
`"a": \|+` correct ✅ |
Motivation:
SnakeYAML's SafeConstructor uses YAML 1.1 implicit type resolution which
does not recognize the 0o prefix for octal integers introduced in YAML 1.2.
This caused std.parseYaml to treat unquoted 0o777 as the string "0o777"
instead of the integer 511, diverging from go-jsonnet and jrsonnet.

Modification:
Replaced SafeConstructor-based parsing with composeAll() which gives access
to raw YAML nodes with scalar style information. Added yamlNodeToJson()
that handles YAML 1.2 octal (0o prefix) for plain (unquoted) scalars while
correctly preserving quoted values as strings. Also handles all other YAML
scalar types (int, float, bool, null) with full YAML 1.1 compatibility.

Result:
std.parseYaml now correctly parses both legacy (0777) and modern (0o777)
octal syntax for unquoted values, while quoted "0o777" remains a string,
matching go-jsonnet and jrsonnet behavior exactly.

| YAML input | go-jsonnet v0.22.0 | jrsonnet 0.5.0-pre99 | sjsonnet (before) | sjsonnet (after) |
|-----------|-------------------|---------------------|-------------------|-----------------|
| 0777      | 511               | 511                 | 511               | 511             |
| 0o777     | 511               | 511                 | "0o777" (bug)     | 511             |
| 0o10      | 8                 | 8                   | "0o10" (bug)      | 8               |
| -0o777    | -511              | -511                | "-0o777" (bug)    | -511            |
| "0o777"   | "0o777"           | "0o777"             | "0o777"           | "0o777"         |
Motivation:
go-jsonnet treats YAML input containing an explicit document start marker
(---) as a multi-document stream, always returning an array even when
there is only one document. sjsonnet returned the single document
directly, diverging from go-jsonnet for inputs like "---", "---\n",
and "---\na: 1".

Note: jrsonnet 0.5.0-pre99 does NOT wrap single-doc YAML with --- in
an array (returns null for "---"), so this aligns sjsonnet with
go-jsonnet's stricter behavior.

Modification:
Added YamlDocStartPattern regex that detects --- followed by whitespace
or end-of-string (per YAML spec). When detected and composeAll returns
a single document, the result is wrapped in an array. Updated existing
ParseYaml tests and go_test_suite golden file to match go-jsonnet.

Result:
std.parseYaml now correctly handles YAML document start markers,
matching go-jsonnet behavior for all edge cases.

| YAML input | go-jsonnet v0.22.0 | jrsonnet 0.5.0-pre99 | sjsonnet (before) | sjsonnet (after) |
|-----------|-------------------|---------------------|-------------------|-----------------|
| "---"     | [null]            | null                | null (bug)        | [null]          |
| "---\n"   | [null]            | null                | null (bug)        | [null]          |
| "---\na:1"| [{a:1}]           | {a:1}               | {a:1} (bug)       | [{a:1}]         |
| "--- 3\n" | [3]               | 3                   | 3 (bug)           | [3]             |
| "a: 1"    | {a:1}             | {a:1}               | {a:1}             | {a:1}           |
@He-Pin He-Pin closed this Jun 18, 2026
@He-Pin He-Pin deleted the fix/parseyaml-doc-marker-v2 branch June 18, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant