Skip to content

Panic when attempting to grep JSON that was read via fjson #6985

@philrz

Description

@philrz
$ curl -s -O https://data.gharchive.org/2023-02-08-3.json.gz &&
  gunzip 2023-02-08-3.json.gz &&
  head -53 2023-02-08-3.json > 53.json &&
  super -dynamic -vam -c "                                                                                                           
SELECT count() AS count
FROM '53.json' (format fjson)
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"

panic: (*vector.Fusion) 0x16280e7a01e0

goroutine 83 [running]:
github.com/brimdata/super/runtime/vam/expr.(*search).eval(0x16280e6a67e0, {0x16280e332670?, 0x16280ea80008?, 0x16280e606460?})
	/Users/phil/work/super/runtime/vam/expr/search.go:103 +0x634
github.com/brimdata/super/vector.Apply(0x1, 0x16280ec11d58, {0x16280e332670, 0x1, 0x1?})
	/Users/phil/work/super/vector/apply.go:25 +0xf6
github.com/brimdata/super/vector.Apply-range1(...)
	/Users/phil/work/super/vector/apply.go:30
github.com/brimdata/super/vector.Apply.rip.func1(...)
	/Users/phil/work/super/vector/apply.go:58
github.com/brimdata/super/vector.Apply(0x1, 0x16280ec11d58, {0x16280e332660, 0x1, 0x1?})
	/Users/phil/work/super/vector/apply.go:28 +0x207
github.com/brimdata/super/runtime/vam/expr.(*search).Eval(0x16280e6a67e0, {0xd749c18?, 0x16280e7a0420?})
	/Users/phil/work/super/runtime/vam/expr/search.go:61 +0x9e
github.com/brimdata/super/runtime/vam/op.(*Filter).Pull(0x16280e6a6810, 0x0)
	/Users/phil/work/super/runtime/vam/op/filter.go:26 +0x65
github.com/brimdata/super/runtime/vam/op/aggregate.(*scalarAggregate).Pull(0x16280e38fb90, 0x0)
	/Users/phil/work/super/runtime/vam/op/aggregate/scalar.go:48 +0xa9
github.com/brimdata/super/runtime/vam/op.(*combineParent).run(0x16280e84e680)
	/Users/phil/work/super/runtime/vam/op/combine.go:109 +0x3c
created by github.com/brimdata/super/runtime/vam/op.(*Combine).Pull.func1 in goroutine 1
	/Users/phil/work/super/runtime/vam/op/combine.go:42 +0x35

Details

Repro is with super commit 6a4216d. The repro data is a subset of the GitHub Archive data set often used in testing.

It works fine if read with the regular JSON reader.

$ super -version
Version: v0.3.0-150-g6a4216db5

$ super -dynamic -vam -c "                                                                                                           
SELECT count() AS count
FROM '53.json'
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"

{count:0}

It also seems like there's something about the mix of the data in these 53 particular values that triggers it when read with fjson, as the problem is not triggered with only the first 52 values nor with only the 53rd value.

$ head -52 2023-02-08-3.json > 52.json

$ super -dynamic -vam -c "                                                                                                           
SELECT count() AS count
FROM '52.json' (format fjson)
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"

{count:0}

$ tail -1 53.json > 1.json

$ super -dynamic -vam -c "                                                                                                           
SELECT count() AS count
FROM '1.json' (format fjson)
WHERE grep('in case you have any feedback 😊', payload.pull_request.body);"

{count:0}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions