Skip to content

feat: Add machine-readable digest of comparison#28

Open
Marius Merkle (MariusMerkleQC) wants to merge 13 commits intomainfrom
digest
Open

feat: Add machine-readable digest of comparison#28
Marius Merkle (MariusMerkleQC) wants to merge 13 commits intomainfrom
digest

Conversation

@MariusMerkleQC
Copy link
Copy Markdown
Collaborator

@MariusMerkleQC Marius Merkle (MariusMerkleQC) commented Mar 31, 2026

Motivation

See #27.

Changes

I debated a few different architectural options:

  1. First, I added a digest.py file in parallel to the summary. There was quite some duplication, as depending on the input parameters of summary() like top_k_column_changes, slim, etc., different data is put into the summary (and should therefore also be serialized). To avoid this, I then switched to the second option.
  2. Here, a new data class SummaryData is populated at initialization of the Summary class. The data can then be used for both
    2.1 Rich-rendered summaries
    2.2 Serialized JSON

The second options avoid code and logic duplication entirely but requires a significant refactor of the summary.py module.

@MariusMerkleQC Marius Merkle (MariusMerkleQC) linked an issue Mar 31, 2026 that may be closed by this pull request
@github-actions github-actions bot added the enhancement New feature or request label Mar 31, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (8e2216f) to head (bd9aa41).

Additional details and impacted files
@@            Coverage Diff             @@
##              main       #28    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           10        10            
  Lines          776       921   +145     
==========================================
+ Hits           776       921   +145     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -18,7 +18,7 @@

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uncovered a bug in the summary fixtures, too.

@@ -0,0 +1,135 @@
# Add `SummaryData` dataclass as the data layer for comparison output
Copy link
Copy Markdown
Collaborator Author

@MariusMerkleQC Marius Merkle (MariusMerkleQC) Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove before merging, I used this as a plan for Claude. It's outdated by now.

box=box.HEAVY,
)
)
if self._comparison.equal():
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes below mostly reference the _data instead of the _comparison object.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a machine-readable JSON “digest” for DataFrameComparison.summary() output by refactoring diffly.summary to compute a structured SummaryData once and using it for both Rich rendering and JSON serialization, plus wiring a new --json CLI flag.

Changes:

  • Refactor Summary to compute/store SummaryData and add Summary.to_json() for JSON serialization.
  • Add --json to the CLI to output the JSON digest instead of the Rich-formatted summary.
  • Extend tests and update Rich-output fixtures to reflect the refactor (notably the Columns table formatting).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
diffly/summary.py Introduces SummaryData* dataclasses, computes them in _compute_summary_data, refactors rendering to use them, and adds Summary.to_json().
diffly/cli.py Adds --json flag and switches output between summary.format() and summary.to_json().
tests/cli/test_cli.py Parametrizes CLI smoke test to cover both Rich output and --json output.
tests/summary/test_summary.py Adds parametrized JSON-digest assertions and unit tests for _to_python() conversions.
lexical-sprouting-scroll.md Adds a design/architecture note describing the refactor and JSON digest approach.
tests/summary/fixtures/many_pk_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/many_pk_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/many_pk_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/many_pk_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/lost_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/gained_rows_only/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt Updates expected Rich output fixture after Columns-table formatting changes.
tests/summary/fixtures/equal_non_empty_different_columns/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt Updates expected Rich output fixture after Columns-table formatting changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM-readable text summaries

2 participants