feat: Add machine-readable digest of comparison#28
feat: Add machine-readable digest of comparison#28Marius Merkle (MariusMerkleQC) wants to merge 13 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #28 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 10 10
Lines 776 921 +145
==========================================
+ Hits 776 921 +145 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -18,7 +18,7 @@ | |||
|
|
|||
There was a problem hiding this comment.
This uncovered a bug in the summary fixtures, too.
| @@ -0,0 +1,135 @@ | |||
| # Add `SummaryData` dataclass as the data layer for comparison output | |||
There was a problem hiding this comment.
Will remove before merging, I used this as a plan for Claude. It's outdated by now.
| box=box.HEAVY, | ||
| ) | ||
| ) | ||
| if self._comparison.equal(): |
There was a problem hiding this comment.
The changes below mostly reference the _data instead of the _comparison object.
There was a problem hiding this comment.
Pull request overview
This PR adds a machine-readable JSON “digest” for DataFrameComparison.summary() output by refactoring diffly.summary to compute a structured SummaryData once and using it for both Rich rendering and JSON serialization, plus wiring a new --json CLI flag.
Changes:
- Refactor
Summaryto compute/storeSummaryDataand addSummary.to_json()for JSON serialization. - Add
--jsonto the CLI to output the JSON digest instead of the Rich-formatted summary. - Extend tests and update Rich-output fixtures to reflect the refactor (notably the Columns table formatting).
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| diffly/summary.py | Introduces SummaryData* dataclasses, computes them in _compute_summary_data, refactors rendering to use them, and adds Summary.to_json(). |
| diffly/cli.py | Adds --json flag and switches output between summary.format() and summary.to_json(). |
| tests/cli/test_cli.py | Parametrizes CLI smoke test to cover both Rich output and --json output. |
| tests/summary/test_summary.py | Adds parametrized JSON-digest assertions and unit tests for _to_python() conversions. |
| lexical-sprouting-scroll.md | Adds a design/architecture note describing the refactor and JSON digest approach. |
| tests/summary/fixtures/many_pk_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/many_pk_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/many_pk_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/many_pk_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/lost_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/lost_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/lost_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/lost_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/gained_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/gained_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/gained_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/gained_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
| tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt | Updates expected Rich output fixture after Columns-table formatting changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Motivation
See #27.
Changes
I debated a few different architectural options:
digest.pyfile in parallel to the summary. There was quite some duplication, as depending on the input parameters ofsummary()liketop_k_column_changes,slim, etc., different data is put into the summary (and should therefore also be serialized). To avoid this, I then switched to the second option.SummaryDatais populated at initialization of theSummaryclass. The data can then be used for both2.1 Rich-rendered summaries
2.2 Serialized JSON
The second options avoid code and logic duplication entirely but requires a significant refactor of the
summary.pymodule.