$ curl -s -O https://data.gharchive.org/2023-02-08-3.json.gz &&
gunzip 2023-02-08-3.json.gz &&
head -53 2023-02-08-3.json > 53.json &&
super -dynamic -vam -c "
SELECT count() AS count
FROM '53.json' (format fjson)
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"
panic: (*vector.Fusion) 0x16280e7a01e0
goroutine 83 [running]:
github.com/brimdata/super/runtime/vam/expr.(*search).eval(0x16280e6a67e0, {0x16280e332670?, 0x16280ea80008?, 0x16280e606460?})
/Users/phil/work/super/runtime/vam/expr/search.go:103 +0x634
github.com/brimdata/super/vector.Apply(0x1, 0x16280ec11d58, {0x16280e332670, 0x1, 0x1?})
/Users/phil/work/super/vector/apply.go:25 +0xf6
github.com/brimdata/super/vector.Apply-range1(...)
/Users/phil/work/super/vector/apply.go:30
github.com/brimdata/super/vector.Apply.rip.func1(...)
/Users/phil/work/super/vector/apply.go:58
github.com/brimdata/super/vector.Apply(0x1, 0x16280ec11d58, {0x16280e332660, 0x1, 0x1?})
/Users/phil/work/super/vector/apply.go:28 +0x207
github.com/brimdata/super/runtime/vam/expr.(*search).Eval(0x16280e6a67e0, {0xd749c18?, 0x16280e7a0420?})
/Users/phil/work/super/runtime/vam/expr/search.go:61 +0x9e
github.com/brimdata/super/runtime/vam/op.(*Filter).Pull(0x16280e6a6810, 0x0)
/Users/phil/work/super/runtime/vam/op/filter.go:26 +0x65
github.com/brimdata/super/runtime/vam/op/aggregate.(*scalarAggregate).Pull(0x16280e38fb90, 0x0)
/Users/phil/work/super/runtime/vam/op/aggregate/scalar.go:48 +0xa9
github.com/brimdata/super/runtime/vam/op.(*combineParent).run(0x16280e84e680)
/Users/phil/work/super/runtime/vam/op/combine.go:109 +0x3c
created by github.com/brimdata/super/runtime/vam/op.(*Combine).Pull.func1 in goroutine 1
/Users/phil/work/super/runtime/vam/op/combine.go:42 +0x35
Details
Repro is with super commit 6a4216d. The repro data is a subset of the GitHub Archive data set often used in testing.
It works fine if read with the regular JSON reader.
$ super -version
Version: v0.3.0-150-g6a4216db5
$ super -dynamic -vam -c "
SELECT count() AS count
FROM '53.json'
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"
{count:0}
It also seems like there's something about the mix of the data in these 53 particular values that triggers it when read with fjson, as the problem is not triggered with only the first 52 values nor with only the 53rd value.
$ head -52 2023-02-08-3.json > 52.json
$ super -dynamic -vam -c "
SELECT count() AS count
FROM '52.json' (format fjson)
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"
{count:0}
$ tail -1 53.json > 1.json
$ super -dynamic -vam -c "
SELECT count() AS count
FROM '1.json' (format fjson)
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"
{count:0}
Details
Repro is with super commit 6a4216d. The repro data is a subset of the GitHub Archive data set often used in testing.
It works fine if read with the regular JSON reader.
It also seems like there's something about the mix of the data in these 53 particular values that triggers it when read with fjson, as the problem is not triggered with only the first 52 values nor with only the 53rd value.