16 KiB
findr — Native Odin File Finder (fd Replacement)
Overview
findr is a native Odin file finder that replaces fd in envr. It supports three ignore modes for A/B benchmarking against specific fd commands, plus a unique "emit ONLY gitignored files" mode that gives envr a single-pass advantage over fd's double-run-and-diff approach.
Directory Structure
findr/
findr.odin # main + CLI (hand-rolled arg parsing)
walker.odin # parallel directory walker (getdents + thread pool)
gitignore.odin # .gitignore parsing + glob→regex transpilation + matching
test_env.odin # test harness: temp dir, mock filesystem, assert helpers
findr_test.odin # integration tests
gitignore_test.odin # transpilation + matching unit tests (22 tests)
CLI Interface
findr [-I] [--ignored] [--no-hidden] [-E <glob>]... [pattern] [path]...
Defaults: include_hidden=true, ignore_mode=.Respected (matches fd's -H behavior).
| fd command | findr equivalent |
|---|---|
fd -a \.env -E ... -HI ~/ |
findr -I -E ... \.env ~/ |
fd -a \.env -E ... -H ~/ |
findr -E ... \.env ~/ |
fd . -H ~/ |
findr ~/ |
fd . -HI ~/ |
findr -I ~/ |
fd . ~/ (no flags) |
findr --no-hidden ~/ |
| (findr original) | findr --ignored ~/ |
Build
odin build findr -o:speed -out:findr/findr
odin test findr
Architecture
Two Orthogonal Axes (matching fd's semantics)
- Hidden files (
.prefix):include_hidden=trueincludes them,falseexcludes them - Gitignore: three modes (see
IgnoreModebelow)
Types
IgnoreMode :: enum {
Respected, // skip gitignored, prune ignored dirs (fd -H default)
All, // ignore .gitignore entirely, descend everywhere (fd -HI)
Ignored, // emit ONLY gitignored files, prune ignored dirs (findr original)
}
WalkOptions :: struct {
pattern: string, // regex on basename; "" = match all
excludes: []string, // glob patterns to skip entirely (fd -E)
include_hidden: bool, // true = include dotfiles (fd -H)
ignore_mode: IgnoreMode,
}
process_dir Filtering Order Per Entry
Each directory traversal carries a WorkItem with the absolute path, a relative path from repo root, and a ^GIContext linked list of gitignore contexts (one per ancestor directory with a .gitignore).
- Skip
.gitdirectory - Load nested
.gitignore: If this directory has a.gitignore, push a newGIContextonto the chain (tracked inpool.all_contextsfor cleanup) - Per entry:
- Skip non-regular files (symlinks, sockets, etc. — parity with
fd -t f) - Excludes: if entry matches any exclude glob → skip entirely
- Hidden: if
!include_hidden && name[0] == '.'→ skip entirely - Gitignore status: check
GIContextchain deepest-to-root viacheck_chain, passing the relative path (not basename). First match wins (correct gitignore precedence). Nested negation overrides parent rules. - Mode-based decision:
- Skip non-regular files (symlinks, sockets, etc. — parity with
| Mode | gitignored file | gitignored dir | normal file | normal dir |
|---|---|---|---|---|
.All |
emit if pattern matches | descend | emit if pattern matches | descend |
.Respected |
skip | prune | emit if pattern matches | descend |
.Ignored |
emit if pattern matches | prune | skip | descend |
Nested repos: When a directory contains .git/, the gitignore context chain is reset (new repo root). The relative path resets to "". Nested repos are always traversed to find deeper repos.
Performance Architecture
- Stat avoidance via
dirent.type— Usescore:sys/linuxgetdents directly, bypassingcore:oswhich callsopenat+fstatper entry. - Prune ignored directories — When a directory matches a gitignore/exclude pattern, it is not descended into.
- Parallel traversal — Worker thread pool with shared LIFO queue and futex-based semaphore signaling. 5.4x speedup over serial on home directory.
Decisions
- Gitignore matching: Transpile gitignore glob patterns to regex, then use
core:text/regex. No dedicated glob matcher. - Pattern matching: Pattern is a regex (same as fd), matched against basename via
regex.match(unanchored search). - Excludes: Glob patterns compiled via the same gitignore transpiler (
parse()). Reuses tested transpilation logic. - Nested gitignore: Every
.gitignorefile within a repo is read, not just the root. Each directory's rules are scoped relative to that directory's path. Negation in a child overrides parent rules (correct gitignore precedence). - Stat avoidance: Use
core:sys/linuxgetdents directly — readdirent.typefrom the kernel, never call stat.DT_UNKNOWNtreated as regular file (correct for ext4/tmpfs; may miss dirs on XFS/BTRFS/FUSE — Phase 7 concern).
Testing Strategy
- In-process integration tests — Tests call
walk()directly (not via subprocess), build mock filesystems in temp dirs, and compare sorted output. - Unit tests — Pure-function tests for glob→regex transpilation and gitignore matching.
- Output sorting for determinism — Always sort output lines before comparison.
- Memory tracking — Odin's test runner reports leaks automatically.
Test Coverage (findr_test.odin)
.Ignored mode (original findr behavior):
| Test | What it covers |
|---|---|
test_basic_gitignored |
Repo with .gitignore, gitignored files emitted, normal files skipped |
test_non_repo_not_scanned |
Dirs without .git/ produce no output |
test_negation_pattern |
!prod.env un-ignores a file |
test_dir_only_pattern |
node_modules/ pattern doesn't emit file results |
test_multiple_repos |
Multiple repos in one tree, each with its own .gitignore |
test_nested_repos |
Repo inside a repo, both scanned independently |
test_no_gitignore_file |
Repo with .git/ but no .gitignore produces nothing |
test_empty_gitignore |
Comments and blank lines only → no results |
test_multiple_search_dirs |
Multiple top-level search dirs in one call |
test_nested_gitignore_read |
Nested .gitignore rules applied (subdir patterns work) |
test_nested_gitignore_negation |
Nested negation overrides parent pattern |
test_multisegment_pattern |
build/output.txt matches relative path, not just basename |
.All mode (fd -HI parity):
| Test | What it covers |
|---|---|
test_all_mode_emits_all_files |
All files emitted regardless of gitignore |
test_all_mode_descends_everywhere |
Gitignored dirs still descended |
.Respected mode (fd -H parity):
| Test | What it covers |
|---|---|
test_respected_mode_skips_gitignored |
Gitignored files skipped |
test_respected_mode_prunes_ignored_dirs |
Gitignored dirs pruned |
test_nested_gitignore_respected_mode |
Nested negation respected in .Respected mode |
Filters:
| Test | What it covers |
|---|---|
test_excludes_prune_dirs |
Excluded dirs not descended |
test_pattern_filters_results |
Only pattern-matching files emitted |
test_no_hidden_skips_dotfiles |
Hidden files skipped when include_hidden=false |
Glob→Regex Transpilation Rules
| Gitignore pattern | Regex | Notes |
|---|---|---|
foo |
`(^ | /)foo(/.*)?$` |
/foo |
^foo(/.*)?$ |
anchored to gitignore dir |
foo/ |
`(^ | /)foo/.*$` |
*.log |
`(^ | /)[^/]*.log$` |
**/foo |
`(^ | /)(./)?foo(/.)?$` |
foo/**/bar |
`(^ | /)foo/(./)?bar(/.)?$` |
!pattern |
(handled by layer) | negation flag, not regex |
#comment |
(skipped) | |
[abc] |
[abc] |
same regex syntax |
? |
[^/] |
single char, no / |
Implementation Phases
Phase 1: Gitignore Transpiler + Tests ✅
22 tests, all passing, zero leaks.
Phase 2: findr Walker + Tests ✅
Parallel DFS using getdents with worker thread pool. 32 total tests pass, zero leaks.
Phase 3: Parallel Traversal ✅
8-worker thread pool, shared LIFO queue, futex-based semaphore. 852ms vs 4.57s serial (5.4x speedup). Serial code removed — parallel is the only implementation.
Phase 4: Benchmark ✅
findr found 227 gitignored files on ~ in 852ms. fd's double-run walked ~1.1M entries.
Phase 5: fd-Parity API ✅
Goal: Make findr replicate specific fd commands for A/B benchmarking, plus keep the unique gitignored-only mode.
Built:
IgnoreModeenum (.Respected,.All,.Ignored) andWalkOptionsstruct- New
walksignature:walk(root, results, opts: WalkOptions, thread_count) - Rewritten
process_dirwith centralized mode-based filtering - Pattern matching via
core:text/regexon basenames - Exclude patterns compiled via existing
gitignore.parse() - CLI arg parsing:
-I,--ignored,--no-hidden,-E <glob> - 7 new integration tests (17 total) covering all three modes, excludes, pattern, and hidden filtering
Result: All tests pass (22 gitignore + 20 walker = 42), zero leaks.
Phase 6: Parity (partially done)
Goal: Achieve file-count parity with fd. An invalid benchmark (different result sets) is useless.
Steps 1-2: Nested gitignore + relative path matching ✅
What was done:
-
Matchenum +check_matchingitignore.odin— Tri-state return (None/Ignored/Unignored) so nested negation overrides work correctly.is_ignoredwraps it as before. -
GIContextlinked list inwalker.odin— Each context holds a^Gitignore,base_rel(relative path from repo root to this dir), andparent: ^GIContext.process_dirloads.gitignorein every directory within a repo (not just roots).check_chainwalks deepest-to-root, first match wins (correct gitignore precedence). -
WorkItemstruct replaced plainstringin the work queue:WorkItem :: struct { path: string, // absolute directory path rel: string, // relative path from repo root ("" = root) gi_ctx: ^GIContext, // gitignore chain (nil = outside any repo) } -
Relative path matching —
check_chainstrips each context'sbase_relprefix to get the locally-scoped relative path. Multi-segment patterns likebuild/output.txtnow match correctly. -
Symlink filtering — Only
DT_REGandDT_UNKNOWNentries are emitted (matchingfd -t f). Symlinks (DT_LNK) are skipped. -
DT_UNKNOWNhandling — Treated as regular files (no stat fallback). Correct for ext4/tmpfs; may miss directories on XFS/BTRFS/FUSE.
Memory management: All GIContext objects tracked in pool.all_contexts (mutex-protected append). Gitignore objects and context structs freed in bulk when walk completes.
Parity achieved (~, 5M+ files):
| Mode | findr | fd equivalent | diff |
|---|---|---|---|
.All (-I) |
5,426,451 | fd -HI -t f --exclude .git |
0 (exact) |
.Respected |
4,442,505 | fd -H -t f --exclude .git |
+1,417 (0.03%) |
--no-hidden |
393,605 | fd -t f --exclude .git |
+17 (0.004%) |
On the envr repo itself, all three modes are exact match (0 diffs). The tiny residual diffs on ~ are likely from global gitignore (~/.config/git/ignore) and .git/info/exclude which fd reads but findr doesn't.
Step 3: DT_UNKNOWN stat fallback (TODO)
On XFS/BTRFS/FUSE filesystems, dirent.type returns DT_UNKNOWN. Currently findr treats these as regular files, which means directories may be missed (not descended into). Add a stat fallback in read_dir_entries when d.type == .UNKNOWN to determine the real type before proceeding. This is not needed for ext4/tmpfs (what tests and most Linux systems use).
Phase 7: Performance Optimization (next)
Goal: Make findr competitive with or faster than fd across all modes. Current benchmark (~, hyperfine 5 runs):
| Command | Mean | vs fd equivalent |
|---|---|---|
findr --ignored |
984ms | (no fd equivalent) |
findr --no-hidden |
542ms | 3.2x slower than fd -t f (170ms) |
findr (respected) |
4.134s | 2.4x slower than fd -H -t f (1.745s) |
findr -I (all) |
3.821s | 1.9x slower than fd -HI -t f (1.972s) |
Bottleneck analysis:
-
Mutex contention on result collection — Every file append goes through
sync.mutex_lock(&pool.results_mutex)→append→sync.mutex_unlock. With 5M+ files across 16 threads, workers serialize on the mutex. -
--ignoredregression — Was 402ms before nested gitignore support, now 984ms. The overhead comes from loading.gitignorein every directory and checking the context chain per entry. Since--ignoredmode prunes gitignored dirs, many of these.gitignoreloads are wasted (the dir won't be descended into anyway). Optimization: skip loading.gitignorefor directories that will be pruned. -
Per-string heap allocation — Every path string is individually
strings.clone'd anddelete'd. Millions of alloc/free calls.
Optimization plan:
-
Per-thread result buffers — Each worker accumulates results in a thread-local
[dynamic]string. Merge into shared array once at the end (single-threaded concat). -
Lazy gitignore loading for
.Ignoredmode — Only load.gitignorewhen we need to decide whether to emit or descend. In.Ignoredmode, we can check the parent context first and skip loading if the directory itself is already ignored. -
Arena allocator for paths — Replace per-string
strings.clonewith a bump allocator. Free everything in onearena_destroyat the end. -
Larger getdents buffer — Increase from 8KB to 64KB to reduce syscall count.
-
BufWriter on stdout — Batch
writesyscalls instead of per-linefmt.println.
Success criteria:
.Allmode faster thanfd -HI -t f --exclude .git.Respectedmode faster thanfd -H -t f --exclude .git--ignoredmode faster thanfd -HI -t f --exclude .git(restore pre-regression advantage)- Re-benchmark after each step using
findr/bench.sh
Phase 8: Integrate into envr
Goal: Replace ALL fd subprocess usage in envr with in-process findr calls. Remove Feature.Fd entirely.
Part A: Rewrite scan_path (scan.odin)
Replace the double-run-and-diff approach with a single findr.walk call using .Ignored mode:
// Before: fd -HI + fd -H, then diff
// After:
findr.walk(search_path, &paths, WalkOptions{
pattern = cfg.ScanConfig.Matcher,
excludes = cfg.ScanConfig.Exclude[:],
include_hidden = true,
ignore_mode = .Ignored,
}, thread_count)
Delete: build_fd_args, run_fd, next_fd_tmp_path, fd_counter, fd_seq, cant_scan.
Part B: Add find_repos and rewrite find_git_roots (config.odin)
Add a find_repos proc to findr that walks a tree and collects directories containing .git/:
find_repos :: proc(root: string, results: ^[dynamic]string, thread_count: int)
- Reuses worker pool architecture
process_diremitsdir_pathwhenhas_git == true- Always descends into subdirs (except
.git) to find nested repos - No gitignore/exclude/pattern processing
Replace find_git_roots's run_fd call with findr.find_repos.
Part C: Remove Feature.Fd everywhere
| File | Change |
|---|---|
features.odin |
Remove Fd from enum, remove fd binary check |
cmd_scan.odin |
Remove feats/cant_scan guard + "install fd" error |
cmd_check.odin |
Same removal |
cmd_deps.odin |
Remove fd table row |
db.odin |
Change check to .Git not_in feats only; update error message |
scan_test.odin |
Remove cant_scan tests and assertions |
Part D: Verification
odin build findr -o:speed -out:findr/findr
odin test findr
odin build . -o:speed -out:envr
odin test .
Risks
| Risk | Mitigation |
|---|---|
Gitignore edge cases (**/foo, foo/**/bar) |
Comprehensive gitignore_test.odin with spec examples |
DT_UNKNOWN on XFS/BTRFS/FUSE |
Phase 6 Step 3: stat fallback for unknown types |
Global gitignore (~/.config/git/ignore) and .git/info/exclude not read |
Causes ~0.03% delta vs fd. Acceptable for envr's use case (finds .env files in repos). |
Thread safety of regex.match on shared Regular_Expression |
Odin regex is read-only after compilation; match returns per-call Captures |