Skip to content

feat(indexing): respect .gitignore when indexing#29

Merged
torresmateo merged 2 commits into
mainfrom
28-respect-gitignore-when-indexing
May 25, 2026
Merged

feat(indexing): respect .gitignore when indexing#29
torresmateo merged 2 commits into
mainfrom
28-respect-gitignore-when-indexing

Conversation

@torresmateo
Copy link
Copy Markdown
Collaborator

@torresmateo torresmateo commented May 22, 2026

Closes #28.

Summary

  • Aggregate patterns from every .gitignore under the indexed root and skip matching files during indexing.
  • Hardcoded baseline (.git, node_modules, __pycache__, caches, binaries, fonts, media) still applies — .gitignore layers on top.
  • New --include-ignored flag on libr add (and include_ignored arg on index_directory_to_library) opts out. The flag is persisted on the source entry and honored by libr index build on rebuild.
  • The previously-duplicated _should_skip_file in cli.py and server.py now delegates to a shared librarian/sources/ignore.py. This file is the home for the new GitignoreMatcher (which uses pathspec's GitIgnoreSpec under the hood and rewrites nested patterns to be anchored under their containing directory).

Test plan

  • make check clean (lint, format, mypy)
  • make test-fast — 96 passed / 8 skipped
  • 13 new unit tests in tests/test_gitignore.py covering: no-gitignore, root patterns, floating patterns at any depth, anchored patterns, nested-gitignore scoping, negation, out-of-root paths, the always-skip baseline, and the include-ignored path
  • CLI smoke test: created a tree with build/ in .gitignore; libr add --dry-run shows 1 file, libr add --dry-run --include-ignored shows 2 files
  • E2E smoke test on synthetic node_modules/ tree: SKIP node_modules/pkg/skip.md, KEEP src/readme.md

Aggregates patterns from every .gitignore under the indexed root and
skips matching files. Hardcoded baseline (.git, node_modules, caches,
binaries) still applies. New --include-ignored flag on libr add (and
include_ignored arg on index_directory_to_library) opts out.

The previously-duplicated _should_skip_file in cli.py and server.py
now delegates to a shared helper in librarian/sources/ignore.py.
Comment thread librarian/sources/ignore.py Outdated
Lets users index files that the gitignore aggregator or the skip-dirs
baseline would otherwise exclude (e.g. a specific package under
node_modules), per PR #29 review feedback.

- New --force-include flag on `libr add` and `force_include` arg on
  `index_directory_to_library`. Persisted on the source entry and
  honored on rebuild.
- New `.librariantrack` file format (gitignore-syntax) opts specific
  patterns back in from any directory under the source root.
- Force-include bypasses the skip-dirs baseline and any .gitignore match
  but does not rescue unsupported/binary/hidden files (the indexer can't
  parse them anyway).
- The skip-dirs baseline and binary-extension list moved to
  `librarian/config.py` as INDEX_SKIP_DIRS / INDEX_SKIP_EXTENSIONS,
  env-overridable instead of hardcoded.
@torresmateo torresmateo requested a review from sdserranog May 25, 2026 17:12
@torresmateo torresmateo merged commit e3872c6 into main May 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Respect .gitignore when indexing (skip node_modules etc.)

2 participants