Skip to content

feat(scan): support .codegraphignore to override .gitignore and defaults#643

Open
amfenix wants to merge 1 commit into
colbymchenry:mainfrom
amfenix:feat/codegraphignore
Open

feat(scan): support .codegraphignore to override .gitignore and defaults#643
amfenix wants to merge 1 commit into
colbymchenry:mainfrom
amfenix:feat/codegraphignore

Conversation

@amfenix
Copy link
Copy Markdown

@amfenix amfenix commented Jun 2, 2026

Summary

Adds an optional project-root .codegraphignore — the final authority on what
the indexer includes, overriding the built-in default-ignores and every
.gitignore (root, nested, and files git itself ignores).

Closes #511. Relates to #622 (provides an opt-in config path to index an
embedded repo hidden by a super-repo's .gitignore; not the automatic
discovery proposed there).

Alternative to / supersedes #531 — thanks @arkrolin for kicking this off. Beyond
that PR, this also: resurfaces gitignored dirs on git repos (#531 re-includes
only on the non-git walk path, so #511's own !system/ example stays invisible
in a git repo), overrides nested .gitignore, keeps dependency/build dirs
out of a broad include (code-aware), handles embedded repos (#622), keeps
getChangedFiles/the watcher consistent with the index, and adds tests + docs.

Problem

File discovery is driven by git ls-files (with a filesystem-walk fallback for
non-git projects), so anything excluded by .gitignore is invisible — even when
that's where the real source lives. Three stacked exclusions that have no
workaround today:

A .gitignore negation (!vendor/) only works for the built-in defaults, not
for paths git itself ignores, so it can't reach any of the above.

Solution

A project-root .codegraphignore (gitignore syntax), consulted as a final
override layer:

  • pathforce-exclude (drop even if it'd otherwise be indexed)
  • !pathforce-include (index even if git/.gitignore/defaults hide it)
  • last matching line wins, so you can re-include a tree then trim a few files

Code-aware force-include

A broad !app/ re-includes that subtree's source, but still leaves built-in
dependency/build dirs (node_modules, dist, .yarn, …) out — unless an
include anchor reaches into one explicitly (!app/node_modules/mypkg/). So
!app/ means "index app's code", not "index app's dependencies".

Routing

Git can't enumerate files it ignores (nor cross an embedded-repo boundary), so a
.codegraphignore containing any force-include routes the scan to the
git-agnostic filesystem walk, which already layers nested .gitignores and the
built-in defaults. With no force-include present, the git fast path is kept and
only force-excludes are applied.

Worked example (the motivating real-world case, #622)

A workspace where the root .gitignore excludes environment/, which is its
own embedded git repo, whose own .gitignore further hides src/app-* and
src/common. Before: indexing from the root captured 0 of that code. With:

# .codegraphignore
!environment/                 # index environment's code (deps stay excluded)
environment/.idea/            # trim tooling noise a broad include pulls in
environment/.pnp.cjs

→ all six app-* dirs + common indexed (~2.7k source files), while
node_modules / dist / .yarn / storybook-static stay excluded.

What changed

  • src/extraction/index.tsloadCodegraphOverride() + CodegraphOverride
    (last-match-wins directives, anchors, code-aware dep rule); wired into
    scanDirectory/scanDirectoryAsync (routing), getGitVisibleFiles
    (force-exclude), scanDirectoryWalk (verdict + descent into excluded dirs to
    reach a buried include), getChangedFiles (skip the git status fast path).
  • src/sync/watcher.ts — the watcher honors the override so watch scope tracks
    index scope.
  • __tests__/codegraphignore.test.ts — 12 tests.
  • CHANGELOG.md, README.md — docs.

Tests

New suite covers: loader null cases, force-exclude/include, nested-.gitignore
override, descent into an excluded dir, code-aware dep handling (broad include
doesn't resurface node_modules; explicit anchor does), git-path routing &
resurfacing, and a no-.codegraphignore-⇒-unchanged regression guard. Both git
and non-git variants. Existing scan / submodule / embedded-repo / watcher suites
still pass.

Safety

All behavior is gated: with no .codegraphignore, loadCodegraphOverride
returns null and every call site collapses to today's exact behavior (pinned
by the regression test). No changes to DB schema, types, MCP tools, or the
installer.

Project-root .codegraphignore is the final authority on indexing scope:
!path force-includes code hidden by the root/nested .gitignore or git
itself, plain lines force-exclude, last match wins. Force-include is
code-aware (dependency/build dirs stay excluded unless named) and routes
the scan to the filesystem walk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support indexing gitignored directories via configuration override

1 participant