1.6 KiB
Performance Ideas
Current state after regex→glob migration. findr beats fd in 3/4 cases.
Benchmark results (2026-06-17)
| Case | fd | findr | Ratio |
|---|---|---|---|
1 -E .jj |
172ms | 135ms | 1.27x faster |
2 -H |
1.184s | 1.097s | 1.08x faster |
3 -HI |
1.251s | 1.670s | 1.34x slower |
4 -E .git |
274ms | 202ms | 1.36x faster |
Case 3 (-HI) skips gitignore entirely, so it's pure I/O + allocation. System time is 2x fd's (12.1s vs 5.5s), pointing to syscall/allocation overhead.
Completed
- Per-thread result buffers — each thread accumulates locally, merges once at exit. Eliminates per-result mutex contention.
- Lean path join —
join_path/join_path_diruse stack buffer +copy+ single alloc instead ofstrings.Builder+fmt.sbprintf+clone. - Regex→glob migration — replaced regex NFA with backtracking glob matcher. Eliminated 27% of CPU spent on
add_thread/is_ignored. Biggest win.
Remaining ideas
-
Larger getdents buffer (8KB → 64KB+) Fewer syscalls per directory with many entries. Low effort.
-
Eliminate entry name cloning
strings.clone(name)inread_dir_entriesheap-allocates per dirent. Names are valid in the getdents buffer duringprocess_dir, so the clone may be unnecessary. Low effort. -
Arena allocator per thread Bump allocator for all transient strings, free once at exit. Bigger change, helps everywhere.
-
Batched channel (fd's approach) Replace global results array with buffered channel of batches. Enables streaming output and sorting like fd does.