#50 implement DRY principle by Asifdotexe · Pull Request #51 · Asifdotexe/Theseus

Asifdotexe · 2026-06-16T07:14:34Z

Summary by CodeRabbit

New Features
- Added fossil update auto-detection: performs full data backfill if genesis is missing; otherwise runs incremental refresh.
- Added manual GitHub Actions workflow trigger for fossil updates.
Refactor
- Reorganized pipeline workflow to use artifact-based data distribution.
- Simplified internal path resolution and helper utilities.
- Removed unused repository cloning logic.
Tests
- Data integrity tests now skip gracefully when no processed data is available.
Chores
- Updated workflow action versions.
- Added temporary directories to gitignore.

coderabbitai · 2026-06-16T07:14:53Z

Warning

Review limit reached

@Asifdotexe, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 40 minutes and 39 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: de8bfe7a-154f-46ed-aebb-10c86d9780db

📥 Commits

Reviewing files that changed from the base of the PR and between ab7fee2 and c683e1f.

📒 Files selected for processing (5)

.github/workflows/theseus-engine.yml
.github/workflows/unit-tests.yml
.github/workflows/update-fossils.yml
scripts/_blame.py
scripts/workflow.py

📝 Walkthrough

Walkthrough

The PR removes the _path_guard sys.path injection pattern from all Python scripts, migrates intra-package imports to scripts.* namespaced paths, simplifies _blame.py and _utils.py logic, makes ensure_repo_ready public in analyse_repository.py, adds an auto_update_fossils dispatcher in add_fossils.py, reworks theseus-engine.yml to an artifact-based pipeline, adds a new update-fossils.yml workflow, and removes dead code from app.js.

Changes

app.js Dead Code Removal

Layer / File(s)	Summary
renderMilestoneMarkers, updateInsights, and createSVGElement cleanup `app.js`	Removes stray `const tooltip = this;` and adds proper `repoInfo` lookup in `renderMilestoneMarkers`; drops unused `first` local in `updateInsights`; deletes the `createSVGElement` helper method from `TheseusVisualizer`.

Python Pipeline Refactoring and Orchestration

Layer / File(s)	Summary
*_path_guard removal and scripts. import migration** `scripts/_path_guard.py`, `scripts/__init__.py`, `scripts/_blame.py`, `scripts/add_fossils.py`, `scripts/analyse_repository.py`, `scripts/cleanup_data.py`, `scripts/workflow.py`, `scripts/run_pipeline.py`, `scripts/_utils.py`	Empties `_path_guard.py` and removes sys.path injection from `__init__.py`; drops all `import _path_guard` side-effect imports; switches all intra-package imports to `scripts.*` namespaced paths; updates schema doc references to point to `_data_io.py`.
_blame.py logic simplification and _utils.py remove_path rewrite `scripts/_blame.py`, `scripts/_utils.py`	Rewrites `_is_hash()` to a compact len+strip check; simplifies fossil commit derivation to a direct string slice; removes `blame_single_file_year_counts` and `blame_year_counts`; changes `BlameRunner.__init__` max_workers to `int \| None`; replaces `remove_path` with a single `shutil.rmtree(path, ignore_errors=True)` call.
analyse_repository.py: ensure_repo_ready public, max_workers: None `scripts/analyse_repository.py`	Removes local `clone_repository`; changes `max_workers` parameter type to `int \| None` in three blame/guard helpers; sets `max_workers = None` unconditionally in `analyze_single_snapshot`; replaces private `_ensure_repo_ready` with a public `ensure_repo_ready` that clones-or-fetches and checks out the default branch; updates `process_repository` to call the new function.
auto_update_fossils dispatcher and workflow.py cleanup `scripts/add_fossils.py`, `scripts/workflow.py`	Adds `_check_genesis` and `auto_update_fossils` to `add_fossils.py`: per-repo dispatcher that runs `backfill_fossils` when genesis is missing or `update_survivor_fossils` when it exists; `main()` calls `auto_update_fossils` by default. Removes `discover-repos` CLI subcommand from `workflow.py`.
CI workflow rework and new update-fossils workflow `.github/workflows/theseus-engine.yml`, `.github/workflows/update-fossils.yml`, `.github/workflows/unit-tests.yml`, `.gitignore`	`theseus-engine.yml`: `discover-repos` uses `jq` over `theseus.config.json`; `analyze` job uploads per-repo artifacts instead of pushing to shared branch; `create-pr` job downloads all artifacts then always commits and force-pushes. New `update-fossils.yml`: manual workflow that runs `add_fossils.py` + `cleanup_data.py` and force-pushes to `chore/fossil-update`. `unit-tests.yml` unpins action versions from commit SHAs. Adds `temp_fossil_repos/` to `.gitignore`.
test_data_integrity graceful skip `tests/test_data_integrity.py`	Adds `pytest` import; replaces hard assertion failure with `pytest.skip(...)` when no JSON data files are found.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GitHub Actions
    participant discover as discover-repos job
    participant analyze as analyze job (matrix)
    participant artifacts as GitHub Artifacts
    participant create_pr as create-pr job
    participant branch as chore/monthly-data-update

    GHA->>discover: trigger theseus-engine workflow
    discover->>discover: jq theseus.config.json → repos matrix
    discover->>analyze: pass repos matrix
    loop per repo
        analyze->>analyze: poetry run python -m scripts.run_pipeline
        analyze->>analyze: write data/.status/<repo>.json
        analyze->>artifacts: upload data-<repo> artifact (1-day retention)
    end
    artifacts->>create_pr: download all data-* artifacts
    create_pr->>create_pr: workflow.py build-pr-body
    create_pr->>create_pr: workflow.py validate-graph-files
    create_pr->>branch: commit data/ and force-push

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Asifdotexe/Theseus#9: The createSVGElement helper removed in this PR was originally introduced in that PR, making them directly code-level related.
Asifdotexe/Theseus#43: Both PRs modify .github/workflows/theseus-engine.yml's CI pipeline structure, specifically how the analyze job passes data and how create-pr assembles it.
Asifdotexe/Theseus#46: This PR removes the _path_guard/sys.path bootstrapping scaffolding and the scripts/workflow.py discover-repos subcommand that PR #46 introduced.

Suggested labels

enhancement

🐇 Hops through the code with glee,
_path_guard no more — imports run free!
Dead code swept clean from app.js bright,
Artifacts carry data through the night.
auto_update_fossils picks the right way,
A tidy pipeline greets the new day! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title "`#50` implement DRY principle" is vague and generic, using non-descriptive language that does not convey the specific changes made in the changeset.	Provide a more specific title that describes the actual changes, such as: "Remove sys.path manipulation and refactor imports to use package-relative paths" or "Simplify helper functions and restructure fossil update logic."

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 50-code-optimisations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Asifdotexe · 2026-06-16T07:20:05Z

+logger = logging.getLogger(__name__)
+
+
+def clone_repository(repo_slug: str, clone_dir: str) -> None:


Add docstrings

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/_utils.py`:
- Around line 199-205: The remove_path function uses Path(path) without
importing it from the pathlib module, causing a NameError at runtime. Add an
import statement at the top of scripts/_utils.py to import Path from pathlib,
ensuring it is available when the remove_path function executes.

In `@scripts/analyse_repository.py`:
- Around line 35-38: The bare internal imports (from _blame, from _data_io, from
_repo, and from _utils) are no longer safe after removing _path_guard and will
cause import failures when the module is loaded through scripts.run_pipeline.
Update all these import statements to use package-qualified paths by prefixing
each module name with "scripts." (for example, change "from _blame import" to
"from scripts._blame import", and similarly for _data_io, _repo, and _utils
modules to ensure consistent and safe imports).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be6d1ec2-7eed-43cc-9aaf-db580843e1b4

📥 Commits

Reviewing files that changed from the base of the PR and between 07f6681 and cdd499d.

📒 Files selected for processing (16)

app.js
data/raw/claude-code_data.json
data/raw/langchain_data.json
data/raw/numpy_data.json
data/raw/react_data.json
scripts/__init__.py
scripts/_blame.py
scripts/_path_guard.py
scripts/_repo.py
scripts/_utils.py
scripts/add_fossils.py
scripts/analyse_repository.py
scripts/cleanup_data.py
scripts/run_pipeline.py
scripts/workflow.py
tests/test_data_integrity.py

💤 Files with no reviewable changes (5)

scripts/_path_guard.py
scripts/init.py
scripts/workflow.py
scripts/cleanup_data.py
app.js

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/update-fossils.yml:
- Around line 50-53: The git rm -rf --cached . command on line 52 in the else
block stages the deletion of all tracked files in the repository when creating
the chore/fossil-update branch, which would result in a commit that deletes
application and workflow files. Replace this overly broad command with a more
targeted operation that only unstages the fossil data files that need to be
re-added (typically in the data/ directory), or remove this command entirely if
it is not necessary for the update workflow.
- Around line 16-21: The workflow uses mutable action version tags and exposes
the write token to intermediate steps. Pin actions/checkout and
actions/setup-python to their full commit SHAs instead of version tags. In the
checkout action, add persist-credentials: false to prevent the token from being
exposed to the pipx install, poetry install, and poetry run python
scripts/update_fossils.py steps. At the final push step (around line 68),
explicitly pass the token using env or with parameters instead of relying on the
persisted credential from checkout.

In `@scripts/update_fossils.py`:
- Around line 113-116: The generate_graph_data function call checks for errors
and logs them, but the script continues to exit successfully even when
graph_error is present. To prevent stale or incomplete processed graph data from
being committed, add a process exit with a non-zero exit code (such as
process.exit(1) or sys.exit(1) depending on the language context) immediately
after the logger.error call when graph_error is truthy, ensuring the script
fails when graph regeneration encounters errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82aea780-3925-40dc-bd51-32a4e21b3fb9

📥 Commits

Reviewing files that changed from the base of the PR and between cdd499d and b670436.

📒 Files selected for processing (3)

.github/workflows/update-fossils.yml
.gitignore
scripts/update_fossils.py

✅ Files skipped from review due to trivial changes (1)

.gitignore

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/_blame.py (1)

202-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

TypeError when max_workers is None due to %d format specifier.

self.max_workers can now be None, but the log statement uses %d which requires an integer. This will raise TypeError: %d format: a real number is required, not NoneType.

🐛 Proposed fix

         logger.info(
-            "  Blaming %d changed files (%d workers)...",
+            "  Blaming %d changed files (%s workers)...",
             len(files),
-            self.max_workers,
+            self.max_workers if self.max_workers is not None else "default",
         )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/_blame.py` around lines 202 - 206, The logger.info call uses the %d
format specifier for self.max_workers which can now be None, causing a
TypeError. Replace self.max_workers in the format string with a conditional
expression that provides a sensible default or string representation when it is
None, or change the %d format specifier to %s which can handle both integer and
None values appropriately.

🧹 Nitpick comments (1)

.github/workflows/theseus-engine.yml (1)
58-66: Pin artifact actions to commit SHAs for supply-chain security.

actions/upload-artifact@v4 (line 59) and actions/download-artifact@v4 (line 79) use mutable version tags. Pinning to full commit SHAs prevents potential supply-chain attacks if the action maintainer's account is compromised.

Latest versions available: actions/upload-artifact@v7.0.1 (043fb46d1a93c77aae656e7c1c64a875d1fc6a0a) and actions/download-artifact@v8.0.1 (3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c).

Also applies to: 78-83
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/theseus-engine.yml around lines 58 - 66, Replace the
mutable version tags for GitHub Actions with pinned commit SHAs to improve
supply-chain security. Update the uses field for actions/upload-artifact from
`@v4` to the full commit SHA 043fb46d1a93c77aae656e7c1c64a875d1fc6a0a, and update
actions/download-artifact from `@v4` to the full commit SHA
3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c. The format should be
actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a and
actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c.
Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/theseus-engine.yml:
- Around line 49-56: The matrix.repo value is being directly interpolated into
shell commands in the poetry run command and the echo statement, which creates a
template injection vulnerability if the repo name contains shell metacharacters.
Replace the direct interpolation of ${{ matrix.repo }} with an environment
variable by adding an env section to these steps that assigns matrix.repo to an
environment variable (e.g., REPO_NAME), then reference the environment variable
(e.g., $REPO_NAME) instead of the direct interpolation in both the poetry run
command on line 49 and the echo command on line 56.

In @.github/workflows/unit-tests.yml:
- Line 15: Replace the version tag pinning in the GitHub Actions workflow with
full commit SHAs to prevent supply-chain attacks. For the actions/checkout@v4
reference on line 15 and the actions/setup-python reference on line 20, visit
their respective GitHub releases pages (github.com/actions/checkout/releases and
github.com/actions/setup-python/releases), find the full commit SHA for the
version tag being used, and update each action reference from the tag format
(e.g., `@v4`) to the complete commit SHA format (e.g., `@abc123def456`...).

---

Outside diff comments:
In `@scripts/_blame.py`:
- Around line 202-206: The logger.info call uses the %d format specifier for
self.max_workers which can now be None, causing a TypeError. Replace
self.max_workers in the format string with a conditional expression that
provides a sensible default or string representation when it is None, or change
the %d format specifier to %s which can handle both integer and None values
appropriately.

---

Nitpick comments:
In @.github/workflows/theseus-engine.yml:
- Around line 58-66: Replace the mutable version tags for GitHub Actions with
pinned commit SHAs to improve supply-chain security. Update the uses field for
actions/upload-artifact from `@v4` to the full commit SHA
043fb46d1a93c77aae656e7c1c64a875d1fc6a0a, and update actions/download-artifact
from `@v4` to the full commit SHA 3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c. The
format should be
actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a and
actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e15bf646-86bf-434b-8f2a-f227530982de

📥 Commits

Reviewing files that changed from the base of the PR and between b670436 and ab7fee2.

📒 Files selected for processing (10)

.github/actions/setup-python-poetry/action.yml
.github/workflows/theseus-engine.yml
.github/workflows/unit-tests.yml
.github/workflows/update-fossils.yml
scripts/_blame.py
scripts/_utils.py
scripts/add_fossils.py
scripts/analyse_repository.py
scripts/cleanup_data.py
scripts/workflow.py

💤 Files with no reviewable changes (1)

.github/actions/setup-python-poetry/action.yml

#50 implement DRY principle

cdd499d

Asifdotexe self-assigned this Jun 16, 2026

Asifdotexe linked an issue Jun 16, 2026 that may be closed by this pull request

Code optimisations #50

Closed

Asifdotexe commented Jun 16, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread scripts/_utils.py Outdated

Comment thread scripts/analyse_repository.py Outdated

Asifdotexe added 2 commits June 16, 2026 14:22

#50 add workflow to update fossil

b670436

#50 remove actions folder

eb14224

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread .github/workflows/update-fossils.yml Outdated

Comment thread .github/workflows/update-fossils.yml Outdated

Comment thread scripts/update_fossils.py Outdated

Asifdotexe added 3 commits June 16, 2026 14:35

#50 simplified the imports

0ff4c9e

#50 make the workflow leaner

ab7fee2

#50 applied pep-8

5366156

coderabbitai Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread .github/workflows/theseus-engine.yml Outdated

Comment thread .github/workflows/unit-tests.yml Outdated

Asifdotexe added 2 commits June 20, 2026 12:15

#50 updated update fossil

71bcff7

#50 address workflow vulnerabilities and logger type error

c683e1f

Asifdotexe merged commit b644f30 into main Jun 20, 2026
2 checks passed

Asifdotexe deleted the 50-code-optimisations branch June 20, 2026 06:54

		logger = logging.getLogger(__name__)


		def clone_repository(repo_slug: str, clone_dir: str) -> None:

Conversation

Asifdotexe commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 inconclusive)

Uh oh!

Asifdotexe Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Asifdotexe commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Asifdotexe Jun 16, 2026 •

edited

Loading