Skip to content

fix(vcon): decode body per spec on read, canonicalize on write#182

Merged
pavanputhra merged 1 commit into
mainfrom
worktree-fix-body-encoding-decode
May 22, 2026
Merged

fix(vcon): decode body per spec on read, canonicalize on write#182
pavanputhra merged 1 commit into
mainfrom
worktree-fix-body-encoding-decode

Conversation

@pavanputhra
Copy link
Copy Markdown
Contributor

Summary

  • draft-ietf-vcon-vcon-core-02 §2.3.2 mandates that attachment/analysis body is always a String, with encoding (json/none/base64url) deciding how to interpret it.
  • Once the Redis store path started stringifying dict/list bodies to satisfy this, readers that assumed body was still the live Python value started crashing. The first one to surface was Vcon.add_tag calling .append on a JSON-encoded string.
  • This PR fixes the read sites, canonicalizes the write sites, and adds regression coverage.

What changed

Helpers on Vcon (common/vcon.py):

  • Vcon.decoded_body(entry) — returns the live Python value, parsing only when encoding == "json" and body is a string. none/base64url bodies pass through.
  • Vcon.with_decoded_body(entry) — shallow copy of an entry with body decoded. Used by callers that want to navigate into body with dict syntax.

Write paths now produce spec-correct shape directly:

  • add_tag writes {"purpose": "tags", "body": json.dumps([…]), "encoding": "json"}.
  • add_attachment / add_analysis JSON-encode dict/list bodies immediately and force encoding="json". The storage normalizer becomes a no-op backstop rather than load-bearing.

Read paths decode before structural access:

  • tag_router, filters.is_included, post_analysis_to_slack, hugging_llm_link, milvus storage extractor, check_and_tag, detect_engagement.
  • Attachment lookups accept either spec-current purpose or legacy type (mirrors find_attachment_by_purpose).
  • filters.is_included gets TypedDict annotations and accepts purpose as an alias for type in only_if configs.

Test plan

  • pytest tests/core/ conserver/links/check_and_tag/ conserver/links/detect_engagement/ conserver/links/tag_router/ conserver/links/post_analysis_to_slack/ → 110 passed, 6 skipped (pre-existing).
  • Exercise the original add_tag crash scenario on a vCon that has been stored and reloaded — no longer raises AttributeError.
  • Run a chain that uses is_included with the new purpose key in the only_if clause.

🤖 Generated with Claude Code

draft-ietf-vcon-vcon-core-02 §2.3.2 mandates that attachment and analysis
body fields are always a String, with encoding (json/none/base64url)
deciding interpretation. The Redis storage normalizer started stringifying
dict/list bodies on store, which broke every reader that assumed body was
still the underlying Python value — the reported crash was Vcon.add_tag
calling .append on a JSON-encoded string.

- Add Vcon.decoded_body and Vcon.with_decoded_body helpers.
- add_tag, add_attachment, add_analysis now write spec-correct shape
  (encoding=json + json.dumps(body)) at the boundary instead of relying
  on the storage normalizer to canonicalize.
- add_tag writes the new attachment under the spec-current \`purpose\` key.
- Readers in tag_router, filters, post_analysis_to_slack, hugging_llm_link,
  milvus, check_and_tag, detect_engagement decode body before dict/list
  access; attachment lookups accept either \`purpose\` or \`type\` for
  back-compat.
- filters.is_included gains TypedDict annotations and accepts \`purpose\`
  as an alias for \`type\`.
- Regression tests added across every touched path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@pavanputhra pavanputhra merged commit 4fc8e8e into main May 22, 2026
1 check failed
pavanputhra added a commit that referenced this pull request May 26, 2026
…3) (#184)

The in-house navigate_dict, copy-pasted into five link modules, used
`key in current` for traversal — which on a non-dict (e.g. an analysis
whose body is a JSON-encoded string after #182) does a substring check
and then crashes with TypeError on `current[key]`.

Replace all five copies with pydash.get (already a core dep), and in
analyze + analyze_and_label also wrap source with Vcon.with_decoded_body
so a dotted text_location can still drill through a JSON-encoded body —
matching what check_and_tag and detect_engagement already do.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pavanputhra added a commit that referenced this pull request May 26, 2026
PR #182 made add_analysis/add_attachment coerce dict/list bodies to
JSON-encoded strings (forcing encoding=json). Three test files still
asserted the old in-memory dict shape and broke CI on main:

- test_encoding: rewrote the two pytest.raises blocks that expected
  dict/list bodies to raise; new code intentionally coerces. Replaced
  with assertions on the new shape, plus new raises for genuinely
  invalid input (bad JSON string, bad base64 string).
- test_get_transcription: switched to Vcon.decoded_body(transcript)
  before drilling into body["text"], since body is now a JSON string.
- TestCheckAndTagMetrics (3 tests): overriding analysis[0]["body"] with
  a plain string while leaving encoding=json caused with_decoded_body
  to crash on json.loads("Hello world"); the link's outer try/except
  swallowed the error and no metric was recorded. Also override
  encoding to "none" alongside the body override.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant