Skip to content

Fix Gemini CLI JSONL session parsing#171

Open
mike1858 wants to merge 1 commit into
mainfrom
fix-gemini-cli-jsonl-snapshots
Open

Fix Gemini CLI JSONL session parsing#171
mike1858 wants to merge 1 commit into
mainfrom
fix-gemini-cli-jsonl-snapshots

Conversation

@mike1858
Copy link
Copy Markdown
Member

@mike1858 mike1858 commented Jun 3, 2026

Summary

Fix Gemini CLI session parsing for newer JSONL chat files.

Gemini CLI can persist message state inside JSONL snapshot rows shaped like:

{"$set":{"messages":[...]}}

Splitrail already handled standalone JSONL message rows, but it ignored every $set row. If a Gemini CLI session stored token-bearing messages in those snapshots, Splitrail could miss the usage locally and, after upload checkpoints advanced, the cloud view would appear to stop receiving Gemini data.

This PR:

  • parses $set.messages snapshot rows in Gemini CLI JSONL sessions
  • keeps the latest message version by message id when both snapshots and standalone rows are present
  • ignores Gemini's internal <session_context> setup message so it does not count as a user prompt or become the session name
  • includes *.jsonl in Gemini CLI glob patterns
  • adds regression tests for JSONL snapshot parsing, JSONL glob coverage, and internal session context filtering

Verification

cargo fmt --all --quiet
cargo build --quiet
cargo test --quiet
cargo clippy --quiet -- -D warnings
cargo doc --quiet

Closes #170

Summary by CodeRabbit

  • New Features

    • Extended file discovery to include .jsonl chat files in addition to .json files
    • Improved parsing logic for better message deduplication and handling of session data records
  • Bug Fixes

    • Internal session context markers embedded in user messages are now automatically filtered for cleaner results

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b55ed36e-0629-4d1f-b89f-ec17609910eb

📥 Commits

Reviewing files that changed from the base of the PR and between 838bdf9 and 8cfd4d7.

📒 Files selected for processing (2)
  • src/analyzers/gemini_cli.rs
  • src/analyzers/tests/gemini_cli.rs

📝 Walkthrough

Walkthrough

GeminiCliAnalyzer now filters user messages containing internal session context, refactors JSONL message deduplication into a reusable helper keyed by message id, adds support for parsing $set records in JSONL files, and extends glob discovery to include *.jsonl files. Comprehensive tests validate the new glob patterns, $set message extraction, and session-name fallback behavior with embedded context.

Changes

Gemini CLI JSONL Parsing and Session Context Filtering

Layer / File(s) Summary
Internal session context detection and filtering
src/analyzers/gemini_cli.rs
Added is_internal_session_context() helper to detect content blocks starting with <session_context>. User-message conversion now skips messages identified as internal session context without emitting a ConversationMessage.
JSONL message deduplication and $set record handling
src/analyzers/gemini_cli.rs
Introduced upsert_jsonl_message helper consolidating deduplication logic by message id. JSONL parsing now handles $set records by iterating and upserting each message from value.$set.messages, while maintaining prior behavior for non-$set entries. Extended glob patterns to include *.jsonl in addition to *.json.
Test coverage for JSONL handling and context filtering
src/analyzers/tests/gemini_cli.rs
Added unit test asserting glob patterns include both *.json and *.jsonl. Added async tests validating $set message parsing with correct message count, roles, and token/stat fields; and validating that internal session context in user messages does not affect session-name fallback while preserving expected message output.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • Piebald-AI/splitrail#155: Both PRs modify src/analyzers/gemini_cli.rs to extend GeminiCliAnalyzer's JSONL parsing—adding per-id latest-message upsertion/dedup (including $set-driven updates) and updated .json/.jsonl handling—so the changes are directly related at the parser code level.

Poem

🐰 A fuzzy quest through JSON lines,
Where $set records align,
Internal whispers filtered out,
Sessions tracked without a doubt—
Gemini's tokens now ring true!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix Gemini CLI JSONL session parsing' directly and clearly summarizes the main change: addressing parsing issues with Gemini CLI JSONL chat sessions.
Linked Issues check ✅ Passed The PR directly addresses the secondary objective from #170 (missing Gemini CLI usage data by fixing JSONL snapshot row parsing and internal message filtering), though it focuses on parsing fixes rather than the primary token count categorization request.
Out of Scope Changes check ✅ Passed All changes focus on JSONL parsing improvements, message deduplication, internal context filtering, and glob pattern updates—all directly related to the stated objective of fixing Gemini CLI session data ingestion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-gemini-cli-jsonl-snapshots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

anyway to capture token count by request

1 participant