7.4 KiB
findr — Gitignored File Finder
Overview
findr is a native Odin tool that finds gitignored files within git repositories. It replaces envr's current approach of running fd twice (all files vs. unignored files) and diffing the results.
Simplified scope: findr does one thing — walks directories, finds git repos, reads each repo's .gitignore, and prints every gitignored file. No flags, no filtering, no pattern matching. envr handles result filtering itself.
Current fd Usage in envr (being replaced)
scan.odin:13-43(scan_path) — runsfdtwice per search path:- Run 1:
fd -a <matcher> [-E <exclude>]... -HI <path>→ all files including gitignored - Run 2:
fd -a <matcher> [-E <exclude>]... -H <path>→ hidden but NOT gitignored - Diff = gitignored files only
- Run 1:
- Both go through
run_fd(scan.odin:68-118), which spawns a subprocess and captures output via temp files.
After findr integration, scan_path calls findr.walk(path) directly — no subprocess, no double-run, no diff.
Directory Structure
findr/
findr.odin # main + CLI (positional dir args only)
walker.odin # recursive directory walker using core:sys/linux getdents
gitignore.odin # .gitignore parsing + glob→regex transpilation + matching
test_env.odin # test harness: temp dir, mock filesystem, assert helpers
findr_test.odin # integration tests (10 tests)
gitignore_test.odin # transpilation + matching unit tests (22 tests)
Decisions
- Scope: findr prints ALL gitignored files. No regex filtering, no exclude patterns, no type filters. envr post-processes the output.
- Gitignore matching: Transpile gitignore glob patterns to regex, then use
core:text/regex. No dedicated glob matcher. - Stat avoidance: Use
core:sys/linuxgetdents directly — readdirent.typefrom the kernel, never call stat. - Architecture: Separate directory with its own
main. Core logic (walkproc +gitignorepackage) designed to be importable into envr later.
CLI Interface
findr [dir1] [dir2] ...
No flags. Defaults to . if no dirs given. Prints absolute or relative paths (as given) to stdout, one per line.
Build
odin build findr -o:speed -out:findr/findr
How It Works
walk(dir):
entries = getdents(dir) # via core:sys/linux, zero stat calls
if entries contains ".git/":
gi = parse(.gitignore) # if present
for entry in entries:
if entry is gitignored file:
emit entry path
if entry is dir (not ignored):
walk(entry) # recurse to find nested repos
else:
for entry in entries:
if entry is dir:
walk(entry) # descend looking for repos
Key behaviors:
- Nested repos: When a repo is found, subdirectories are still traversed to find nested repos. Gitignored directories are pruned (not descended into).
- Flat gitignore: Only the root
.gitignoreis read..gitignorefiles in subdirectories of a repo are ignored. - Non-repo dirs: Traversed recursively to find repos. No gitignore rules apply.
Performance Architecture
Implemented
- Stat avoidance via
dirent.type— Usescore:sys/linuxgetdents directly, bypassingcore:oswhich callsopenat+fstatper entry. File type comes free from the directory entry. - Prune ignored directories — When a directory matches a gitignore pattern, it is not descended into. Skips potentially thousands of readdir calls.
Future (if needed)
- Work-stealing parallel traversal (per-thread LIFO deques with batch stealing, like fd)
- BufWriter on stdout for large result sets
- Arena allocators for path strings
Testing Strategy
- In-process integration tests — Tests call
walk()directly (not via subprocess), build mock filesystems in temp dirs, and compare sorted output. - Unit tests — Pure-function tests for glob→regex transpilation and gitignore matching.
- Output sorting for determinism — Always sort output lines before comparison.
- Memory tracking — Odin's test runner reports leaks automatically. All 32 tests pass with zero leaks.
Test Coverage (findr_test.odin)
| Test | What it covers |
|---|---|
test_basic_gitignored |
Repo with .gitignore, gitignored files emitted, normal files skipped |
test_non_repo_not_scanned |
Dirs without .git/ produce no output |
test_negation_pattern |
!prod.env un-ignores a file |
test_dir_only_pattern |
node_modules/ pattern doesn't emit file results |
test_multiple_repos |
Multiple repos in one tree, each with its own .gitignore |
test_nested_repos |
Repo inside a repo, both scanned independently |
test_gitignore_in_subdir_ignored |
Subdirectory .gitignore files are not read |
test_no_gitignore_file |
Repo with .git/ but no .gitignore produces nothing |
test_empty_gitignore |
Comments and blank lines only → no results |
test_multiple_search_dirs |
Multiple top-level search dirs in one call |
Gitignore Unit Tests (gitignore_test.odin)
22 tests covering: simple/anchored patterns, *, ?, [abc], [!abc], dot escaping, globstar variants, backslash escapes, empty patterns, basic matching, negation, dir-only, comments, blank lines, last-match-wins, env patterns.
Glob→Regex Transpilation Rules
| Gitignore pattern | Regex | Notes |
|---|---|---|
foo |
`(^ | /)foo(/.*)?$` |
/foo |
^foo(/.*)?$ |
anchored to gitignore dir |
foo/ |
`(^ | /)foo/.*$` |
*.log |
`(^ | /)[^/]*.log$` |
**/foo |
`(^ | /)(./)?foo(/.)?$` |
foo/**/bar |
`(^ | /)foo/(./)?bar(/.)?$` |
!pattern |
(handled by layer) | negation flag, not regex |
#comment |
(skipped) | |
[abc] |
[abc] |
same regex syntax |
? |
[^/] |
single char, no / |
Implementation Phases
Phase 1: Gitignore Transpiler + Tests ✅
Goal: Isolated, fully-tested glob→regex transpiler.
Result: 22 tests, all passing, zero leaks.
Phase 2: findr Walker + Tests ✅
Goal: Working tool that finds gitignored files in git repos.
Built:
walker.odin— Single-threaded DFS usingcore:sys/linuxgetdents. Finds repos, reads.gitignore, emits gitignored files, recurses into subdirs for nested repos.findr.odin— Minimal CLI:findr [dirs...], no flags.test_env.odin— Test harness with temp dirs and mock filesystems.findr_test.odin— 10 integration tests.
Result: All 32 tests pass (22 gitignore + 10 walker), zero leaks.
Phase 3: Parallel Traversal (future)
Goal: Parallelize directory descent for large trees.
Phase 4: Benchmark (future)
Goal: Quantify performance vs fd on large directory trees.
Phase 5: Integrate into envr (future)
Goal: Replace run_fd in scan.odin. scan_path calls findr.walk() directly instead of two subprocess runs + diff.
Risks
| Risk | Mitigation |
|---|---|
| Single-threaded may be slow on huge trees | Add threading in Phase 3 after correctness |
Gitignore edge cases (**/foo, foo/**/bar) |
Comprehensive gitignore_test.odin with spec examples |
| dirent.type may be UNKNOWN on some filesystems | Fall back to stat only when type is UNKNOWN |
Missing nested .env files in monorepos |
Accepted limitation — flat gitignore model |
| Memory allocation churn from path strings | Use thread-local arena allocators in Phase 3 |