feat: Add machine-readable digest of comparison by MariusMerkleQC · Pull Request #28 · Quantco/diffly

Marius Merkle (MariusMerkleQC) · 2026-03-31T20:30:51Z

Motivation

See #27.

Changes

I debated a few different architectural options:

First, I added a digest.py file in parallel to the summary. There was quite some duplication, as depending on the input parameters of summary() like top_k_column_changes, slim, etc., different data is put into the summary (and should therefore also be serialized). To avoid this, I then switched to the second option.
Here, a new data class SummaryData is populated at initialization of the Summary class. The data can then be used for both
2.1 Rich-rendered summaries
2.2 Serialized JSON

The second options avoid code and logic duplication entirely but requires a significant refactor of the summary.py module.

codecov · 2026-03-31T20:32:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (8e2216f) to head (bd9aa41).

Additional details and impacted files

@@            Coverage Diff             @@
##              main       #28    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           10        10            
  Lines          776       921   +145     
==========================================
+ Hits           776       921   +145

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Marius Merkle (MariusMerkleQC) · 2026-04-01T16:34:55Z

...umns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt

@@ -18,7 +18,7 @@



This uncovered a bug in the summary fixtures, too.

Marius Merkle (MariusMerkleQC) · 2026-04-01T16:35:19Z

lexical-sprouting-scroll.md

@@ -0,0 +1,135 @@
+# Add `SummaryData` dataclass as the data layer for comparison output


Will remove before merging, I used this as a plan for Claude. It's outdated by now.

Marius Merkle (MariusMerkleQC) · 2026-04-01T16:36:11Z

diffly/summary.py

                    box=box.HEAVY,
                )
            )
-        if self._comparison.equal():


The changes below mostly reference the _data instead of the _comparison object.

Copilot

Pull request overview

This PR adds a machine-readable JSON “digest” for DataFrameComparison.summary() output by refactoring diffly.summary to compute a structured SummaryData once and using it for both Rich rendering and JSON serialization, plus wiring a new --json CLI flag.

Changes:

Refactor Summary to compute/store SummaryData and add Summary.to_json() for JSON serialization.
Add --json to the CLI to output the JSON digest instead of the Rich-formatted summary.
Extend tests and update Rich-output fixtures to reflect the refactor (notably the Columns table formatting).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
diffly/summary.py	Introduces `SummaryData*` dataclasses, computes them in `_compute_summary_data`, refactors rendering to use them, and adds `Summary.to_json()`.
diffly/cli.py	Adds `--json` flag and switches output between `summary.format()` and `summary.to_json()`.
tests/cli/test_cli.py	Parametrizes CLI smoke test to cover both Rich output and `--json` output.
tests/summary/test_summary.py	Adds parametrized JSON-digest assertions and unit tests for `_to_python()` conversions.
lexical-sprouting-scroll.md	Adds a design/architecture note describing the refactor and JSON digest approach.
tests/summary/fixtures/many_pk_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/many_pk_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/many_pk_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/many_pk_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updates expected Rich output fixture after Columns-table formatting changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

diffly/summary.py

lexical-sprouting-scroll.md

feat: Add machine-readable digest of comparison

4cf7a5f

Marius Merkle (MariusMerkleQC) self-assigned this Mar 31, 2026

Marius Merkle (MariusMerkleQC) linked an issue Mar 31, 2026 that may be closed by this pull request

LLM-readable text summaries #27

Open

github-actions bot added the enhancement New feature or request label Mar 31, 2026

Marius Merkle (MariusMerkleQC) added 10 commits April 1, 2026 13:59

update plan

30f27b0

initial implementation

c4d62a9

improve test coverage

9633f1f

cli test coverage

74332e8

refactor

23b293e

improve

fbaaba1

clean up

a63e229

simplify test

f7292ad

improve test coverage

1bf4e1a

fix timedelta

42a9781

Marius Merkle (MariusMerkleQC) commented Apr 1, 2026

View reviewed changes

Merge branch 'main' into digest

369b6ae

Marius Merkle (MariusMerkleQC) requested a review from Copilot April 1, 2026 16:36

Copilot started reviewing on behalf of Marius Merkle (MariusMerkleQC) April 1, 2026 16:37 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

diffly/summary.py Outdated Show resolved Hide resolved

diffly/summary.py Show resolved Hide resolved

lexical-sprouting-scroll.md Show resolved Hide resolved

lexical-sprouting-scroll.md Show resolved Hide resolved

feedback copilot

bd9aa41

Marius Merkle (MariusMerkleQC) marked this pull request as ready for review April 1, 2026 16:56

Marius Merkle (MariusMerkleQC) requested review from EgeKaraismailogluQC and Oliver Borchert (borchero) as code owners April 1, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add machine-readable digest of comparison#28

feat: Add machine-readable digest of comparison#28
Marius Merkle (MariusMerkleQC) wants to merge 13 commits intomainfrom
digest

Marius Merkle (MariusMerkleQC) commented Mar 31, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

Marius Merkle (MariusMerkleQC) Apr 1, 2026

Uh oh!

Marius Merkle (MariusMerkleQC) Apr 1, 2026 •

edited

Loading

Uh oh!

Marius Merkle (MariusMerkleQC) Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,135 @@
		# Add `SummaryData` dataclass as the data layer for comparison output

Conversation

Marius Merkle (MariusMerkleQC) commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Marius Merkle (MariusMerkleQC) Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Marius Merkle (MariusMerkleQC) Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Marius Merkle (MariusMerkleQC) Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marius Merkle (MariusMerkleQC) commented Mar 31, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading

Marius Merkle (MariusMerkleQC) Apr 1, 2026 •

edited

Loading