This commit is contained in:
2026-06-16 23:34:58 -04:00
parent 1fc5f8280e
commit 598c622287
2 changed files with 177 additions and 223 deletions

View File

@@ -78,10 +78,10 @@ Key behaviors:
- **Stat avoidance via `dirent.type`** — Uses `core:sys/linux` getdents directly, bypassing `core:os` which calls `openat` + `fstat` per entry. File type comes free from the directory entry.
- **Prune ignored directories** — When a directory matches a gitignore pattern, it is not descended into. Skips potentially thousands of readdir calls.
- **Parallel traversal** — 8-worker thread pool with shared LIFO queue and futex-based semaphore signaling. 5.4x speedup over serial on home directory.
### Future (if needed)
- Work-stealing parallel traversal (per-thread LIFO deques with batch stealing, like fd)
- BufWriter on stdout for large result sets
- Arena allocators for path strings
@@ -141,7 +141,7 @@ Key behaviors:
**Goal:** Working tool that finds gitignored files in git repos.
**Built:**
- `walker.odin`Single-threaded DFS using `core:sys/linux` getdents. Finds repos, reads `.gitignore`, emits gitignored files, recurses into subdirs for nested repos.
- `walker.odin`Parallel DFS using `core:sys/linux` getdents with 8-worker thread pool. Finds repos, reads `.gitignore`, emits gitignored files, recurses into subdirs for nested repos.
- `findr.odin` — Minimal CLI: `findr [dirs...]`, no flags.
- `test_env.odin` — Test harness with temp dirs and mock filesystems.
- `findr_test.odin` — 10 integration tests.
@@ -150,16 +150,20 @@ Key behaviors:
---
### Phase 3: Parallel Traversal (future)
### Phase 3: Parallel Traversal
**Goal:** Parallelize directory descent for large trees.
**Result:** Worker pool with shared LIFO queue, 8 threads, futex-based semaphore signaling. 852ms vs 4.57s serial (5.4x speedup) on `~`. Serial code has been removed — parallel is the only implementation.
---
### Phase 4: Benchmark (future)
### Phase 4: Benchmark
**Goal:** Quantify performance vs fd on large directory trees.
**Result:** findr found 227 gitignored files on `~` in 852ms. fd's double-run (all vs unignored) walked ~1.1M entries. findr's pruning of ignored directories (node_modules, dist, etc.) gives a massive advantage.
---
### Phase 5: Integrate into envr (future)
@@ -170,7 +174,7 @@ Key behaviors:
| Risk | Mitigation |
|---|---|
| Single-threaded may be slow on huge trees | Add threading in Phase 3 after correctness |
| Single-threaded may be slow on huge trees | Resolved — parallel traversal implemented (Phase 3) |
| Gitignore edge cases (`**/foo`, `foo/**/bar`) | Comprehensive gitignore_test.odin with spec examples |
| dirent.type may be UNKNOWN on some filesystems | Fall back to stat only when type is UNKNOWN |
| Missing nested `.env` files in monorepos | Accepted limitation — flat gitignore model |