fix(vcon): decode body per spec on read, canonicalize on write#182
Merged
Conversation
draft-ietf-vcon-vcon-core-02 §2.3.2 mandates that attachment and analysis body fields are always a String, with encoding (json/none/base64url) deciding interpretation. The Redis storage normalizer started stringifying dict/list bodies on store, which broke every reader that assumed body was still the underlying Python value — the reported crash was Vcon.add_tag calling .append on a JSON-encoded string. - Add Vcon.decoded_body and Vcon.with_decoded_body helpers. - add_tag, add_attachment, add_analysis now write spec-correct shape (encoding=json + json.dumps(body)) at the boundary instead of relying on the storage normalizer to canonicalize. - add_tag writes the new attachment under the spec-current \`purpose\` key. - Readers in tag_router, filters, post_analysis_to_slack, hugging_llm_link, milvus, check_and_tag, detect_engagement decode body before dict/list access; attachment lookups accept either \`purpose\` or \`type\` for back-compat. - filters.is_included gains TypedDict annotations and accepts \`purpose\` as an alias for \`type\`. - Regression tests added across every touched path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
pavanputhra
added a commit
that referenced
this pull request
May 26, 2026
…3) (#184) The in-house navigate_dict, copy-pasted into five link modules, used `key in current` for traversal — which on a non-dict (e.g. an analysis whose body is a JSON-encoded string after #182) does a substring check and then crashes with TypeError on `current[key]`. Replace all five copies with pydash.get (already a core dep), and in analyze + analyze_and_label also wrap source with Vcon.with_decoded_body so a dotted text_location can still drill through a JSON-encoded body — matching what check_and_tag and detect_engagement already do. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
pavanputhra
added a commit
that referenced
this pull request
May 26, 2026
PR #182 made add_analysis/add_attachment coerce dict/list bodies to JSON-encoded strings (forcing encoding=json). Three test files still asserted the old in-memory dict shape and broke CI on main: - test_encoding: rewrote the two pytest.raises blocks that expected dict/list bodies to raise; new code intentionally coerces. Replaced with assertions on the new shape, plus new raises for genuinely invalid input (bad JSON string, bad base64 string). - test_get_transcription: switched to Vcon.decoded_body(transcript) before drilling into body["text"], since body is now a JSON string. - TestCheckAndTagMetrics (3 tests): overriding analysis[0]["body"] with a plain string while leaving encoding=json caused with_decoded_body to crash on json.loads("Hello world"); the link's outer try/except swallowed the error and no metric was recorded. Also override encoding to "none" alongside the body override.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bodyis always a String, withencoding(json/none/base64url) deciding how to interpret it.bodywas still the live Python value started crashing. The first one to surface wasVcon.add_tagcalling.appendon a JSON-encoded string.What changed
Helpers on
Vcon(common/vcon.py):Vcon.decoded_body(entry)— returns the live Python value, parsing only whenencoding == "json"and body is a string.none/base64urlbodies pass through.Vcon.with_decoded_body(entry)— shallow copy of an entry with body decoded. Used by callers that want to navigate into body with dict syntax.Write paths now produce spec-correct shape directly:
add_tagwrites{"purpose": "tags", "body": json.dumps([…]), "encoding": "json"}.add_attachment/add_analysisJSON-encode dict/list bodies immediately and forceencoding="json". The storage normalizer becomes a no-op backstop rather than load-bearing.Read paths decode before structural access:
tag_router,filters.is_included,post_analysis_to_slack,hugging_llm_link,milvusstorage extractor,check_and_tag,detect_engagement.purposeor legacytype(mirrorsfind_attachment_by_purpose).filters.is_includedgetsTypedDictannotations and acceptspurposeas an alias fortypeinonly_ifconfigs.Test plan
pytest tests/core/ conserver/links/check_and_tag/ conserver/links/detect_engagement/ conserver/links/tag_router/ conserver/links/post_analysis_to_slack/→ 110 passed, 6 skipped (pre-existing).add_tagcrash scenario on a vCon that has been stored and reloaded — no longer raisesAttributeError.is_includedwith the newpurposekey in theonly_ifclause.🤖 Generated with Claude Code