Self-Describing Binary Log Format (v3)#13231
Draft
masaori335 wants to merge 2 commits into
Draft
Conversation
Seven boolean/counter fields were declared dINT, but their marshal functions write a single int, and proxy_protocol_version (ppv) was declared dINT while it actually marshals a string. The dINT type wrongly excludes these fields from log filters and aggregates, and the ppv mislabeling misrepresents variable-length string bytes as two fixed ints to any type-driven consumer. Retype the single-int fields as sINT and ppv as STRING so the declared type matches what each marshal function emits.
Publish each field's type in a per-segment schema so a generic reader can
decode a .blog from the file alone, without an embedded ATS symbol-to-type
table that must track the writer in lockstep. The per-field code is
LogField::Type serialized directly (now an enum class : uint8_t with INVALID=0
reserved and sINT..IP = 1..4 as the frozen wire codes); a static_assert pins
the values. This relies on each field's declared type matching its marshalled
framing, which the parent commit ("Fix mismatched sINT/dINT log field types")
establishes.
Readers (LogBufferIterator, logcat, logstats, the ASCII output paths) accept
both v2 and v3 segments, sizing the header read to the on-disk version, so a v3
build keeps decoding logs written by an older one. Integer values stay in host
byte order, as in v2 (no endianness change). The public TSLogType enum is given
the same values as LogField::Type so TSLogFieldRegister can static_cast between
them; static_asserts in InkAPI.cc (the only TU that sees both) pin the
alignment so a future reorder fails to compile.
The writer version is per-LogObject: logging.yaml "binary_log_version: 2"
pins a binary log to the pre-v3 layout (no schema, shorter header) so a
not-yet-upgraded downstream parser keeps working during a migration; the
default is v3.
Decoding untrusted .blog input is bounded: LogBufferIterator validates
data_offset and each entry against the segment, and the JSON decoder validates
the schema offset alignment and cross-checks field_count against the symbol
list.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces binary log format v3 for Apache Traffic Server logging, making .blog segments self-describing by embedding a per-segment field-type schema. This decouples binary log decoding from the exact ATS build that produced the log and adds tooling/tests/docs around the new format.
Changes:
- Add a v3 per-segment field-type schema to the binary log segment header and make segment header reads version-sized (v2/v3 compatibility).
- Extend
traffic_logcatwith a schema-driven JSON output mode (-j/--json) and harden iteration/decoding against malformed segments. - Add unit + gold tests and documentation for the v3 on-disk format and configuration (
binary_log_version).
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gold_tests/logging/gold/binary_log_v3_json.gold | Gold output for traffic_logcat -j JSON decoding of v3 logs. |
| tests/gold_tests/logging/gold/binary_log_v3_ascii.gold | Gold output for ASCII decoding of v2/v3 logs. |
| tests/gold_tests/logging/binary_log_v3.test.py | End-to-end gold test exercising v2/v3 binary logs and v3 JSON output. |
| src/traffic_logstats/logstats.cc | Read segment headers using on-disk header size; add corruption guards. |
| src/traffic_logcat/unit-tests/test_LogEntryJson.cc | Unit tests for schema-driven v3 JSON reference decoder. |
| src/traffic_logcat/LogEntryJson.h | Public interface/contract for v3 entry-to-JSON reference decoder. |
| src/traffic_logcat/LogEntryJson.cc | Implementation of schema-only v3 JSON decoder with bounds checks. |
| src/traffic_logcat/logcat.cc | Add -j/--json option and version-sized header reads; emit JSON lines. |
| src/traffic_logcat/CMakeLists.txt | Build LogEntryJson into traffic_logcat and add Catch2 unit test target. |
| src/proxy/logging/YamlLogConfig.cc | Add binary_log_version logging.yaml key parsing and propagation to LogObject. |
| src/proxy/logging/unit-tests/test_LogBuffer.cc | Unit tests covering v3 schema/type alignment and version-sized header sizing/iteration. |
| src/proxy/logging/LogObject.cc | Extend LogObject ctor to store per-object binary_log_version. |
| src/proxy/logging/LogFormat.cc | Update type enum usage for aggregation checks. |
| src/proxy/logging/LogFilter.cc | Update type enum usage and improve unknown-type error reporting. |
| src/proxy/logging/LogFile.cc | Accept v2/v3 segment versions for ASCII conversion paths. |
| src/proxy/logging/LogField.cc | Convert type to scoped enum and update assertions/display formatting. |
| src/proxy/logging/LogBuffer.cc | Write v3 type schema, set per-object segment version, and harden iterator. |
| src/proxy/logging/LogAccess.cc | Fix pointer advancement bug in unmarshal_http_version. |
| src/proxy/logging/Log.cc | Align field declarations to new LogField::Type and fix several field types (incl. ppv). |
| src/proxy/logging/CMakeLists.txt | Add Catch2 unit test target for LogBuffer v3. |
| src/api/InkAPI.cc | Pin TSLogType-to-LogField::Type relationship and validate plugin-provided types. |
| include/ts/apidefs.h.in | Update TSLogType enum values/comments to mirror v3 wire-type codes. |
| include/proxy/logging/LogObject.h | Add binary_log_version ctor arg/default and accessor/storage. |
| include/proxy/logging/LogFormat.h | Expose field_list() for writer-side schema emission in v3. |
| include/proxy/logging/LogField.h | Redefine LogField::Type as a stable, append-only wire-code enum class. |
| include/proxy/logging/LogBuffer.h | Bump LOG_SEGMENT_VERSION to 3; add schema struct and helper sizing/version utilities. |
| doc/developer-guide/logging-architecture/index.en.rst | Add v3 format page to logging architecture docs index. |
| doc/developer-guide/logging-architecture/binary-log-v3-format.en.rst | New specification document for v3 on-disk format and decoding rules. |
| doc/appendices/command-line/traffic_logstats.en.rst | Document v2/v3 support and reference v3 format spec. |
| doc/appendices/command-line/traffic_logcat.en.rst | Document v2/v3 support and new -j/--json option behavior. |
| doc/admin-guide/files/logging.yaml.en.rst | Document binary_log_version key and provide v2/v3 configuration examples. |
Comment on lines
+91
to
+99
| char * | ||
| LogBufferHeader::fmt_fieldtypes() | ||
| { | ||
| char *addr = nullptr; | ||
| if (fmt_fieldtypes_offset) { | ||
| addr = reinterpret_cast<char *>(this) + fmt_fieldtypes_offset; | ||
| } | ||
| return addr; | ||
| } |
Comment on lines
+178
to
+188
| for (const char *p = s; p < nul; ++p) { | ||
| // Minimal JSON escaping for structural characters. | ||
| if (*p == '"' || *p == '\\') { | ||
| if (!put_ch('\\')) { | ||
| return -1; | ||
| } | ||
| } | ||
| if (!put_ch(*p)) { | ||
| return -1; | ||
| } | ||
| } |
Comment on lines
1652
to
1657
| enum TSLogType { | ||
| TS_LOG_TYPE_INT, | ||
| TS_LOG_TYPE_INT = 1, ///< LogField::Type::sINT | ||
| // DINT is omitted from the public API for now, until we decide whether we keep the type | ||
| TS_LOG_TYPE_STRING = 2, | ||
| TS_LOG_TYPE_ADDR = 3, | ||
| TS_LOG_TYPE_STRING = 3, ///< LogField::Type::STRING | ||
| TS_LOG_TYPE_ADDR = 4, ///< LogField::Type::IP | ||
| }; |
Contributor
Author
There was a problem hiding this comment.
This renumbering is fine because TSLogType is going to be released by 11.0.0, it's not published yet.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
In version 2, a segment header carries the field symbols (
fmt_fieldlist,e.g.
"chi cqu pssc") and a printf-style template (fmt_printf) butnot the field types. To decode an entry a reader had to already know the
type of each symbol, because the value encodings are only self-delimiting once
the type is known (
IPis variable length, for example). That coupled everyout-of-tree parser to the exact ATS build that wrote the log.
Version 3 adds one thing: a per-segment field-type schema that lists the
wire type of every field, in field order. Decoding then needs only the symbols
(as keys) and the schema (for types).
Depends on #13223