Compare commits
10 Commits
c772a1ba27
...
ce73726c71
| Author | SHA1 | Date | |
|---|---|---|---|
| ce73726c71 | |||
| dd157ccb52 | |||
| 26a6028ae6 | |||
| 2eecf7c348 | |||
| ed8a43ecff | |||
| 90e02a46bd | |||
| 32d1a2b818 | |||
| 505da344e0 | |||
| e5eac93071 | |||
| 488a13004b |
5
.gitignore
vendored
Normal file
5
.gitignore
vendored
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
/findr
|
||||||
|
/findr-prof
|
||||||
|
*.spall
|
||||||
|
bench-output-*.md
|
||||||
|
bench-results.md
|
||||||
@@ -1,27 +1,111 @@
|
|||||||
findr is ~2.3x slower than fd (case 1: 547ms vs 241ms). Opportunities:
|
# Performance Ideas
|
||||||
|
|
||||||
1. Per-thread result buffers (DONE)
|
Current state after regex→glob migration + inline entry processing + skip gitignore in .All mode + channel-based streaming output + byte-buffer output. findr beats fd in 4/4 cases.
|
||||||
Each thread accumulates results locally, then merges once at exit. Eliminates per-result mutex contention.
|
|
||||||
|
|
||||||
2. Batched channel (fd's approach)
|
## Benchmark results (2026-06-17, post-byte-buffer)
|
||||||
Replace global results array + merge with a buffered channel of batches. Each worker fills a local batch (~256 items), sends it to a `chan.Chan([]string)` (capacity = 2 × threads). A receiver thread drains batches and collects/prints. Provides backpressure, streaming output, and per-batch (not global) synchronization. Enables sorting like fd does (buffer first 1000 results or 100ms, then stream).
|
|
||||||
|
|
||||||
3. Path allocation waste (join_path/join_path_dir)
|
| Case | fd | findr | Ratio |
|
||||||
Every path construction spins up a strings.Builder, does fmt.sbprintf, to_string, clone, then builder_destroy — 2 heap allocs + 2 frees per path. Could be a simple memcpy into a stack buffer with a single alloc.
|
|------|------|-------|-------|
|
||||||
|
| 1 `-E .jj` | 148ms | 99ms | **1.50x faster** |
|
||||||
|
| 2 `-H` | 1.142s | 609ms | **1.88x faster** |
|
||||||
|
| 3 `-HI` | 1.009s | 966ms | **1.04x faster** |
|
||||||
|
| 4 `-E .git` | 268ms | 197ms | **1.36x faster** |
|
||||||
|
|
||||||
4. Larger getdents buffer
|
Byte-buffer output eliminated per-result string allocations. Workers now write `path\n` directly into `[]u8` buffers sent through the channel; the output writer does a single bulk write per batch. Case 3 (`-HI`, 5.6M entries) flipped from 1.12x slower to 1.04x faster — the biggest win since it has the most output.
|
||||||
Currently 8KB. Increasing to 64KB+ means fewer syscalls per directory with many entries.
|
|
||||||
|
|
||||||
5. Eliminate entry name cloning
|
## Completed
|
||||||
strings.clone(name) in read_dir_entries heap-allocates per dirent. Names are valid in the getdents buffer during process_dir, so the clone may be unnecessary.
|
|
||||||
|
|
||||||
6. Arena allocator per thread
|
1. **Per-thread result buffers** — each thread accumulates locally, merges once at exit. Eliminates per-result mutex contention.
|
||||||
Replace the default allocator for transient strings with a bump allocator — allocate in bulk, free all at once.
|
2. **Lean path join** — `join_path`/`join_path_dir` use stack buffer + `copy` + single alloc instead of `strings.Builder` + `fmt.sbprintf` + `clone`.
|
||||||
2. Path allocation waste (join_path/join_path_dir)
|
3. **Regex→glob migration** — replaced regex NFA with backtracking glob matcher. Eliminated 27% of CPU spent on `add_thread`/`is_ignored`. Biggest win.
|
||||||
Every path construction spins up a strings.Builder, does fmt.sbprintf, to_string, clone, then builder_destroy — 2 heap allocs + 2 frees per path. Could be a simple memcpy into a stack buffer with a single alloc.
|
4. **32KB getdents buffer** — bumped from 8KB. Marginal improvement, within noise.
|
||||||
3. Larger getdents buffer
|
5. **Skip gitignore loading in `.All` mode** — eliminated thousands of unnecessary file opens/parses in `-HI`. Cut system time 34% (12.4s → 8.2s).
|
||||||
Currently 8KB. Increasing to 64KB+ means fewer syscalls per directory with many entries.
|
6. **Fixed-size threads slice** — replaced `[dynamic]^thread.Thread` with `[]^thread.Thread` since thread count is known upfront.
|
||||||
4. Eliminate entry name cloning
|
7. **Inline entry processing** — merged `read_dir_entries` into `process_dir`. Entry names consumed directly from getdents buffer via `dirent_name(d)` views. Eliminated millions of `strings.clone`/`delete` pairs. User time dropped 38% in `-HI` case.
|
||||||
strings.clone(name) in read_dir_entries heap-allocates per dirent. Names are valid in the getdents buffer during process_dir, so the clone may be unnecessary.
|
8. **Skip `has_git_dir` probe in `.All` mode** — guarded `has_git_dir(fd)` with `ignore_mode != .All`. Eliminated ~280K wasted `openat` ENOENT probes in `-HI` case. System time dropped 33% (11.3s → 7.6s).
|
||||||
5. Arena allocator per thread
|
9. **Channel-based streaming output** — replaced global results array + mutex with `chan.Chan([]string)`, cap `2 * thread_count`. Workers flush 256-result batches through the channel; a consumer thread drains to stdout. Matches fd's architecture (`crossbeam_channel::bounded(2*threads)`, batch size `0x100`). Eliminates the collect-then-write barrier. Cases 1/2/4 went from 1.1-1.3x faster to 1.3-1.7x faster.
|
||||||
Replace the default allocator for transient strings with a bump allocator — allocate in bulk, free all at once.
|
10. **Byte-buffer output** — replaced `chan.Chan([]string)` with `chan.Chan([]u8)`. Workers write `path\n` directly into 64KB byte buffers via `append_path`; output writer does a single bulk `writer_write` per batch. Eliminates ~5M `join_path` allocs, ~5M `delete(s)` frees, ~20K batch array allocs. Case 3 (`-HI`) flipped from 1.12x slower to 1.04x faster. All 4 cases now beat fd.
|
||||||
|
|
||||||
|
## fd vs findr architecture comparison
|
||||||
|
|
||||||
|
| Aspect | fd (ignore crate) | findr |
|
||||||
|
|--------|-------------------|-------|
|
||||||
|
| Syscall | `libc::readdir` | raw `getdents64` |
|
||||||
|
| Entry names | Clones into owned `PathBuf` per entry | Zero-copy view from getdents buffer |
|
||||||
|
| `.git` detection | `stat(".git")` per directory | `openat(fd, ".git")` probe per directory |
|
||||||
|
| Gitignore setup | Before entry iteration | Before entry iteration |
|
||||||
|
| Path traversal | Full paths | Full paths |
|
||||||
|
| Glob matching | globset stratification (literals→hash, complex→regex) | Backtracking token matcher |
|
||||||
|
| Result transport | `crossbeam_channel::bounded(2*threads)` (lock-free MPMC) | `core:sync/chan` (single-mutex ring buffer) |
|
||||||
|
| Batching | `Arc<Mutex<Option<Vec>>>` shared buffer, flush on first item | 64KB `[]u8` byte buffers, flush when full |
|
||||||
|
| Output mode | Hybrid: buffer 1000 items / 100ms → sort → stream | Bulk byte writes, direct streaming (no buffer/sort mode yet) |
|
||||||
|
|
||||||
|
## Known problems
|
||||||
|
|
||||||
|
1. **Allocator efficiency gap** — findr still allocates 1-3 heap strings per entry (`join_path` results, work item paths). fd does the same but benefits from Rust's allocator. Odin's default allocator may have higher per-allocation overhead.
|
||||||
|
|
||||||
|
2. **Channel mutex contention (unconfirmed)** — Odin's `core:sync/chan` uses a single mutex for the entire ring buffer. With 16 senders + 1 receiver hitting the same lock, every `chan.send`/`chan.recv` is a potential futex contention point. fd uses `crossbeam_channel::bounded` which is lock-free MPMC. **Note**: early spall profiles showed 11.8% futex_wait, but this was likely a profiling artifact — the channel ops generate more instrumentation events, causing the 1GB spall cap to be hit over a longer wall-time window (3.5s vs 1s), skewing the profile. Needs a fair comparison (smaller tree or larger cap) to confirm whether this is real.
|
||||||
|
|
||||||
|
## Remaining ideas
|
||||||
|
|
||||||
|
### Allocation strategies
|
||||||
|
|
||||||
|
Allocation audit (per-entry hot path in `process_dir`):
|
||||||
|
|
||||||
|
| Site | What | Est. count (-HI) |
|
||||||
|
|------|------|-------------------|
|
||||||
|
| `join_path`/`join_path_dir` for results | `make([]u8, total)` for result paths | ~5M |
|
||||||
|
| `join_path` for WorkItem paths | same, for recursed dirs | ~500K |
|
||||||
|
| `strings.clone(entry_rel)` | clone for WorkItem.rel | ~500K |
|
||||||
|
| `clone_to_c_string(dir_path)` | cstring for `open()` | ~500K |
|
||||||
|
| `flush_batch` → `make([dynamic]string)` | new batch array | ~20K |
|
||||||
|
| `delete(s)` per result | free in output writer | ~5M |
|
||||||
|
|
||||||
|
Available Odin allocators: `core:mem` (Arena, Dynamic_Arena, Stack, etc.), `core:mem/tlsf` (TLSF — O(1) alloc/free, supports individual frees, grows via backing allocator).
|
||||||
|
|
||||||
|
1. **Byte-buffer output — eliminate result path allocations entirely** *(COMPLETED — see #10 in Completed)*
|
||||||
|
|
||||||
|
2. **Stack-buffer cstring for `open()`**
|
||||||
|
Replace `strings.clone_to_c_string(dir_path)` + `delete(cpath)` with a stack buffer copy:
|
||||||
|
```odin
|
||||||
|
cbuf: [4096]u8
|
||||||
|
copy(cbuf[:], dir_path)
|
||||||
|
cbuf[len(dir_path)] = 0
|
||||||
|
fd, err := linux.open(cstring(raw_data(&cbuf[0])), ...)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Eliminates**: ~500K heap allocs for cstrings. Trivial change.
|
||||||
|
|
||||||
|
3. **Arena for WorkItem paths**
|
||||||
|
Use a `Dynamic_Arena` or virtual-memory bump allocator for `join_path` results and `clone(entry_rel)` in WorkItems. Remove individual `delete(item.path)` / `delete(item.rel)` calls. Free arena once at end of `walk_stream`.
|
||||||
|
|
||||||
|
**Eliminates**: ~1M individual alloc/free pairs for WorkItem paths/rels.
|
||||||
|
|
||||||
|
**Challenge**: WorkItems cross thread boundaries via the queue, so the arena must be shared. A shared `Dynamic_Arena` needs synchronization on the bump pointer. Cleanest approach: `core:mem/virtual` to reserve a large address space (e.g. 256MB) and do `atomic_add_explicit(&offset, size, .Acquire)` for lock-free bump allocation.
|
||||||
|
|
||||||
|
4. **TLSF as global allocator**
|
||||||
|
Swap `context.allocator` to TLSF at program start. O(1) alloc/free with good cache locality. ~5 lines of code. Best as a fallback if strategies 1-3 don't fully close the gap.
|
||||||
|
|
||||||
|
### Other ideas
|
||||||
|
|
||||||
|
5. **Lock-free MPMC queue**
|
||||||
|
Replace Odin's mutex-based channel with a custom multi-producer-single-consumer ring buffer using atomics. Eliminates all futex syscalls on the result-transport hot path.
|
||||||
|
|
||||||
|
**Design**:
|
||||||
|
- Fixed-capacity ring buffer of `[]u8` slots (cap = `2 * thread_count`, same as now)
|
||||||
|
- Producer side: each worker atomic-CASes a `head` counter forward to claim a slot index, writes its batch, then sets a `ready` flag on the slot
|
||||||
|
- Consumer side: atomic-load `head`, drains all ready slots up to `head`, writes to stdout, frees batches
|
||||||
|
- Backpressure: if `head - tail >= cap`, producer spins/waits (yields via `sched_yield` or `futex` with private flag)
|
||||||
|
- Close: atomic flag set by `walk_stream` after all workers joined; consumer drains remaining then exits
|
||||||
|
|
||||||
|
**Alternative**: Use a per-producer SPSC queue (one ring per worker thread). Consumer round-robins across all N queues. No CAS on producer side — each worker writes to its own queue with only a `store` + fence. Consumer reads from each with a `load`. Trades simplicity for zero contention.
|
||||||
|
|
||||||
|
**Risk**: Low. The API surface is small (`send`, `recv`, `close`). Can be swapped behind the existing `flush_batch` interface without touching `walk_worker` or `output_writer`. fd's `crossbeam_channel` proves lock-free MPMC is achievable.
|
||||||
|
|
||||||
|
**Effort**: Medium. ~100-150 lines for the queue + a few tests. No changes to walker or main.
|
||||||
|
|
||||||
|
6. **Buffer/sort output mode** (fd's approach)
|
||||||
|
Buffer up to 1000 results (or 100ms deadline), sort them, then switch to streaming. Gives sorted output for small searches without sacrificing throughput on large ones. fd's `ReceiverMode::Buffering → Streaming` pattern.
|
||||||
|
|
||||||
|
7. **Git index parsing**
|
||||||
|
Parse `.git/index` binary format to show tracked dotfiles. Closes the 84-file correctness delta in cases 1/4. Last correctness gap.
|
||||||
|
|||||||
54
findr.odin
54
findr.odin
@@ -1,9 +1,29 @@
|
|||||||
package findr
|
package findr
|
||||||
|
|
||||||
import "core:bufio"
|
import "core:bufio"
|
||||||
import "core:fmt"
|
|
||||||
import "core:os"
|
import "core:os"
|
||||||
import "core:strings"
|
import "core:strings"
|
||||||
|
import "core:sync/chan"
|
||||||
|
import "core:thread"
|
||||||
|
|
||||||
|
Writer_Data :: struct {
|
||||||
|
ch: chan.Chan([]u8),
|
||||||
|
}
|
||||||
|
|
||||||
|
output_writer :: proc(t: ^thread.Thread) {
|
||||||
|
data := cast(^Writer_Data)t.data
|
||||||
|
|
||||||
|
w: bufio.Writer
|
||||||
|
bufio.writer_init(&w, os.to_stream(os.stdout), 1 << 13)
|
||||||
|
defer bufio.writer_destroy(&w)
|
||||||
|
|
||||||
|
for {
|
||||||
|
batch := chan.recv(data.ch) or_break
|
||||||
|
bufio.writer_write(&w, batch)
|
||||||
|
delete(batch)
|
||||||
|
}
|
||||||
|
bufio.writer_flush(&w)
|
||||||
|
}
|
||||||
|
|
||||||
main :: proc() {
|
main :: proc() {
|
||||||
prof_init()
|
prof_init()
|
||||||
@@ -70,22 +90,24 @@ main :: proc() {
|
|||||||
append(&paths, ".")
|
append(&paths, ".")
|
||||||
}
|
}
|
||||||
|
|
||||||
results := make([dynamic]string)
|
|
||||||
defer {
|
|
||||||
for r in results {delete(r)}
|
|
||||||
delete(results)
|
|
||||||
}
|
|
||||||
|
|
||||||
thread_count := os.get_processor_core_count()
|
thread_count := os.get_processor_core_count()
|
||||||
walk(paths[:], &results, opts, thread_count)
|
|
||||||
|
|
||||||
w: bufio.Writer
|
ch, _ := chan.create(chan.Chan([]u8), max(2 * thread_count, 2), context.allocator)
|
||||||
bufio.writer_init(&w, os.to_stream(os.stdout), 1 << 13)
|
defer chan.destroy(ch)
|
||||||
defer bufio.writer_destroy(&w)
|
|
||||||
|
|
||||||
for r in results {
|
wdata := new(Writer_Data)
|
||||||
bufio.writer_write_string(&w, r)
|
wdata.ch = ch
|
||||||
bufio.writer_write_byte(&w, '\n')
|
defer free(wdata)
|
||||||
}
|
|
||||||
bufio.writer_flush(&w)
|
writer := thread.create(output_writer)
|
||||||
|
writer.data = rawptr(wdata)
|
||||||
|
writer.init_context = context
|
||||||
|
thread.start(writer)
|
||||||
|
|
||||||
|
walk_stream(paths[:], ch, opts, thread_count)
|
||||||
|
|
||||||
|
chan.close(ch)
|
||||||
|
thread.join(writer)
|
||||||
|
thread.destroy(writer)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
155
gitignore.odin
155
gitignore.odin
@@ -1,112 +1,36 @@
|
|||||||
package findr
|
package findr
|
||||||
|
|
||||||
import "core:fmt"
|
|
||||||
import "core:strings"
|
import "core:strings"
|
||||||
import "core:text/regex"
|
|
||||||
|
|
||||||
// FIXME: Use a const bit_set[0..<128; u128] here when we start doing optimizations
|
Gitignore :: struct {
|
||||||
is_regex_meta :: proc(c: u8) -> bool {
|
rules: [dynamic]Rule,
|
||||||
switch c {
|
|
||||||
case '.', '+', '(', ')', '{', '}', '^', '$', '|', '#':
|
|
||||||
return true
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
glob_to_regex :: proc(pattern: string, anchored: bool) -> string {
|
|
||||||
// TODO: Attempt to pre-allocate the string builder when we start doing optimizations
|
|
||||||
sb: strings.Builder
|
|
||||||
strings.builder_init(&sb)
|
|
||||||
defer strings.builder_destroy(&sb)
|
|
||||||
|
|
||||||
if anchored {
|
|
||||||
fmt.sbprintf(&sb, "^")
|
|
||||||
} else {
|
|
||||||
fmt.sbprintf(&sb, "(^|/)")
|
|
||||||
}
|
|
||||||
|
|
||||||
i := 0
|
|
||||||
for i < len(pattern) {
|
|
||||||
c := pattern[i]
|
|
||||||
|
|
||||||
if c == '*' {
|
|
||||||
if i + 1 < len(pattern) && pattern[i + 1] == '*' {
|
|
||||||
prev_slash := i == 0 || pattern[i - 1] == '/'
|
|
||||||
at_end := i + 2 >= len(pattern)
|
|
||||||
next_slash := !at_end && pattern[i + 2] == '/'
|
|
||||||
|
|
||||||
if prev_slash && (next_slash || at_end) {
|
|
||||||
if next_slash {
|
|
||||||
i += 3
|
|
||||||
fmt.sbprintf(&sb, "(.*/)?")
|
|
||||||
} else {
|
|
||||||
i += 2
|
|
||||||
fmt.sbprintf(&sb, ".*")
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
fmt.sbprintf(&sb, "[^/]*")
|
|
||||||
i += 2
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
fmt.sbprintf(&sb, "[^/]*")
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
} else if c == '?' {
|
|
||||||
fmt.sbprintf(&sb, "[^/]")
|
|
||||||
i += 1
|
|
||||||
} else if c == '[' {
|
|
||||||
append(&sb.buf, '[')
|
|
||||||
i += 1
|
|
||||||
if i < len(pattern) && pattern[i] == '!' {
|
|
||||||
append(&sb.buf, '^')
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
if i < len(pattern) && pattern[i] == ']' {
|
|
||||||
append(&sb.buf, ']')
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
for i < len(pattern) && pattern[i] != ']' {
|
|
||||||
append(&sb.buf, pattern[i])
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
if i < len(pattern) {
|
|
||||||
append(&sb.buf, ']')
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
} else if c == '\\' {
|
|
||||||
i += 1
|
|
||||||
if i < len(pattern) {
|
|
||||||
if is_regex_meta(pattern[i]) {
|
|
||||||
append(&sb.buf, '\\')
|
|
||||||
}
|
|
||||||
append(&sb.buf, pattern[i])
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
} else if is_regex_meta(c) {
|
|
||||||
append(&sb.buf, '\\')
|
|
||||||
append(&sb.buf, c)
|
|
||||||
i += 1
|
|
||||||
} else {
|
|
||||||
append(&sb.buf, c)
|
|
||||||
i += 1
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
fmt.sbprintf(&sb, "$")
|
|
||||||
|
|
||||||
s := strings.to_string(sb)
|
|
||||||
result, _ := strings.clone(s)
|
|
||||||
return result
|
|
||||||
}
|
}
|
||||||
|
|
||||||
Rule :: struct {
|
Rule :: struct {
|
||||||
regex: regex.Regular_Expression,
|
pattern: GlobPattern,
|
||||||
negated: bool,
|
negated: bool,
|
||||||
dir_only: bool,
|
dir_only: bool,
|
||||||
}
|
}
|
||||||
|
|
||||||
Gitignore :: struct {
|
Match :: enum {
|
||||||
rules: [dynamic]Rule,
|
None,
|
||||||
|
Ignored,
|
||||||
|
Unignored,
|
||||||
|
}
|
||||||
|
|
||||||
|
is_ignored :: proc(gi: ^Gitignore, path: string, is_dir: bool) -> bool {
|
||||||
|
return check_match(gi, path, is_dir) == .Ignored
|
||||||
|
}
|
||||||
|
|
||||||
|
check_match :: proc(gi: ^Gitignore, path: string, is_dir: bool) -> Match {
|
||||||
|
result := Match.None
|
||||||
|
for &rule in gi.rules {
|
||||||
|
if rule.dir_only && !is_dir do continue
|
||||||
|
if glob_match_compiled(&rule.pattern, path) {
|
||||||
|
result = rule.negated ? .Unignored : .Ignored
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return result
|
||||||
}
|
}
|
||||||
|
|
||||||
parse :: proc(content: string) -> Gitignore {
|
parse :: proc(content: string) -> Gitignore {
|
||||||
@@ -148,43 +72,16 @@ parse :: proc(content: string) -> Gitignore {
|
|||||||
|
|
||||||
if len(s) == 0 do continue
|
if len(s) == 0 do continue
|
||||||
|
|
||||||
regex_str := glob_to_regex(s, anchored)
|
gp := glob_compile(s, anchored)
|
||||||
re, err := regex.create(regex_str, {regex.Flag.No_Capture})
|
append(&gi.rules, Rule{pattern = gp, negated = negated, dir_only = dir_only})
|
||||||
delete(regex_str)
|
|
||||||
if err != nil do continue
|
|
||||||
|
|
||||||
append(&gi.rules, Rule{regex = re, negated = negated, dir_only = dir_only})
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return gi
|
return gi
|
||||||
}
|
}
|
||||||
|
|
||||||
Match :: enum {
|
|
||||||
None,
|
|
||||||
Ignored,
|
|
||||||
Unignored,
|
|
||||||
}
|
|
||||||
|
|
||||||
check_match :: proc(gi: ^Gitignore, path: string, is_dir: bool) -> Match {
|
|
||||||
result := Match.None
|
|
||||||
for rule in gi.rules {
|
|
||||||
if rule.dir_only && !is_dir do continue
|
|
||||||
cap, ok := regex.match(rule.regex, path)
|
|
||||||
regex.destroy(cap)
|
|
||||||
if ok {
|
|
||||||
result = rule.negated ? .Unignored : .Ignored
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return result
|
|
||||||
}
|
|
||||||
|
|
||||||
is_ignored :: proc(gi: ^Gitignore, path: string, is_dir: bool) -> bool {
|
|
||||||
return check_match(gi, path, is_dir) == .Ignored
|
|
||||||
}
|
|
||||||
|
|
||||||
destroy :: proc(gi: ^Gitignore) {
|
destroy :: proc(gi: ^Gitignore) {
|
||||||
for rule in gi.rules {
|
for &rule in gi.rules {
|
||||||
regex.destroy(rule.regex)
|
glob_destroy(&rule.pattern)
|
||||||
}
|
}
|
||||||
delete(gi.rules)
|
delete(gi.rules)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -4,100 +4,103 @@ import "core:testing"
|
|||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_simple :: proc(t: ^testing.T) {
|
test_glob_simple :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("foo", false)
|
testing.expect(t, glob_match("foo", "foo", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("foo", "bar/foo", false))
|
||||||
testing.expect_value(t, result, "(^|/)foo$")
|
testing.expect(t, !glob_match("foo", "foobar", false))
|
||||||
|
testing.expect(t, !glob_match("foo", "foo/bar", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_anchored :: proc(t: ^testing.T) {
|
test_glob_anchored :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("foo", true)
|
testing.expect(t, glob_match("foo", "foo", true))
|
||||||
defer delete(result)
|
testing.expect(t, !glob_match("foo", "bar/foo", true))
|
||||||
testing.expect_value(t, result, "^foo$")
|
testing.expect(t, !glob_match("foo", "foobar", true))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_star :: proc(t: ^testing.T) {
|
test_glob_star :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("*.log", false)
|
testing.expect(t, glob_match("*.log", "test.log", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("*.log", ".log", false))
|
||||||
testing.expect_value(t, result, "(^|/)[^/]*\\.log$")
|
testing.expect(t, !glob_match("*.log", "test.txt", false))
|
||||||
|
testing.expect(t, !glob_match("*.log", "dir/test", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_question :: proc(t: ^testing.T) {
|
test_glob_question :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("?.log", false)
|
testing.expect(t, glob_match("?.log", "a.log", false))
|
||||||
defer delete(result)
|
testing.expect(t, !glob_match("?.log", "ab.log", false))
|
||||||
testing.expect_value(t, result, "(^|/)[^/]\\.log$")
|
testing.expect(t, !glob_match("?.log", ".log", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_char_class :: proc(t: ^testing.T) {
|
test_glob_char_class :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("[abc].log", false)
|
testing.expect(t, glob_match("[abc].log", "a.log", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("[abc].log", "b.log", false))
|
||||||
testing.expect_value(t, result, "(^|/)[abc]\\.log$")
|
testing.expect(t, !glob_match("[abc].log", "d.log", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_negated_class :: proc(t: ^testing.T) {
|
test_glob_negated_class :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("[!abc].log", false)
|
testing.expect(t, glob_match("[!abc].log", "d.log", false))
|
||||||
defer delete(result)
|
testing.expect(t, !glob_match("[!abc].log", "a.log", false))
|
||||||
testing.expect_value(t, result, "(^|/)[^abc]\\.log$")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_dot_escaped :: proc(t: ^testing.T) {
|
test_glob_dot_literal :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex(".env", false)
|
testing.expect(t, glob_match(".env", ".env", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match(".env", "dir/.env", false))
|
||||||
testing.expect_value(t, result, "(^|/)\\.env$")
|
testing.expect(t, !glob_match(".env", "env", false))
|
||||||
|
testing.expect(t, !glob_match(".env", "x.env", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_globstar_prefix :: proc(t: ^testing.T) {
|
test_glob_globstar_prefix :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("**/foo", false)
|
testing.expect(t, glob_match("**/foo", "foo", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("**/foo", "a/b/foo", false))
|
||||||
testing.expect_value(t, result, "(^|/)(.*/)?foo$")
|
testing.expect(t, !glob_match("**/foo", "foobar", false))
|
||||||
|
testing.expect(t, !glob_match("**/foo", "a/foobar", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_globstar_suffix :: proc(t: ^testing.T) {
|
test_glob_globstar_suffix :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("abc/**", false)
|
testing.expect(t, glob_match("abc/**", "abc/x", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("abc/**", "abc/x/y", false))
|
||||||
testing.expect_value(t, result, "(^|/)abc/.*$")
|
testing.expect(t, !glob_match("abc/**", "abc", false))
|
||||||
|
testing.expect(t, !glob_match("abc/**", "abcd/x", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_globstar_middle :: proc(t: ^testing.T) {
|
test_glob_globstar_middle :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("foo/**/bar", false)
|
testing.expect(t, glob_match("foo/**/bar", "foo/bar", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("foo/**/bar", "foo/x/bar", false))
|
||||||
testing.expect_value(t, result, "(^|/)foo/(.*/)?bar$")
|
testing.expect(t, !glob_match("foo/**/bar", "foo/barx", false))
|
||||||
|
testing.expect(t, !glob_match("foo/**/bar", "foo/x/y/baz", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_backslash_escape :: proc(t: ^testing.T) {
|
test_glob_backslash_escape :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("\\!foo", false)
|
testing.expect(t, glob_match("\\!foo", "!foo", false))
|
||||||
defer delete(result)
|
testing.expect(t, !glob_match("\\!foo", "foo", false))
|
||||||
testing.expect_value(t, result, "(^|/)!foo$")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_hash_escaped :: proc(t: ^testing.T) {
|
test_glob_hash_literal :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("#foo", false)
|
testing.expect(t, glob_match("#foo", "#foo", false))
|
||||||
defer delete(result)
|
testing.expect(t, !glob_match("#foo", "foo", false))
|
||||||
testing.expect_value(t, result, "(^|/)\\#foo$")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_hash_in_pattern :: proc(t: ^testing.T) {
|
test_glob_hash_pattern :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("#*#", false)
|
testing.expect(t, glob_match("#*#", "#test#", false))
|
||||||
defer delete(result)
|
testing.expect(t, glob_match("#*#", "##", false))
|
||||||
testing.expect_value(t, result, "(^|/)\\#[^/]*\\#$")
|
testing.expect(t, !glob_match("#*#", "test", false))
|
||||||
|
testing.expect(t, !glob_match("#*#", "#test", false))
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
test_glob_empty :: proc(t: ^testing.T) {
|
test_glob_empty :: proc(t: ^testing.T) {
|
||||||
result := glob_to_regex("", false)
|
testing.expect(t, glob_match("", "", false))
|
||||||
defer delete(result)
|
testing.expect(t, !glob_match("", "foo", false))
|
||||||
testing.expect_value(t, result, "(^|/)$")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@(test)
|
@(test)
|
||||||
|
|||||||
210
glob.odin
Normal file
210
glob.odin
Normal file
@@ -0,0 +1,210 @@
|
|||||||
|
package findr
|
||||||
|
|
||||||
|
Range :: struct {
|
||||||
|
lo: u8,
|
||||||
|
hi: u8,
|
||||||
|
}
|
||||||
|
|
||||||
|
Class_Data :: struct {
|
||||||
|
negated: bool,
|
||||||
|
ranges: [dynamic]Range,
|
||||||
|
}
|
||||||
|
|
||||||
|
Token_Kind :: enum u8 { Char, Star, Globstar, Question, Class }
|
||||||
|
|
||||||
|
Token :: struct {
|
||||||
|
kind: Token_Kind,
|
||||||
|
byte: u8,
|
||||||
|
class_idx: u16,
|
||||||
|
}
|
||||||
|
|
||||||
|
GlobPattern :: struct {
|
||||||
|
tokens: [dynamic]Token,
|
||||||
|
classes: [dynamic]Class_Data,
|
||||||
|
anchored: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
glob_compile :: proc(pattern: string, anchored: bool) -> GlobPattern {
|
||||||
|
gp: GlobPattern
|
||||||
|
gp.tokens = make([dynamic]Token)
|
||||||
|
gp.classes = make([dynamic]Class_Data)
|
||||||
|
gp.anchored = anchored
|
||||||
|
|
||||||
|
i := 0
|
||||||
|
for i < len(pattern) {
|
||||||
|
c := pattern[i]
|
||||||
|
|
||||||
|
if c == '*' {
|
||||||
|
if i + 1 < len(pattern) && pattern[i + 1] == '*' {
|
||||||
|
prev_slash := i == 0 || pattern[i - 1] == '/'
|
||||||
|
at_end := i + 2 >= len(pattern)
|
||||||
|
next_slash := !at_end && pattern[i + 2] == '/'
|
||||||
|
|
||||||
|
if prev_slash && (next_slash || at_end) {
|
||||||
|
append(&gp.tokens, Token{kind = .Globstar})
|
||||||
|
if next_slash {
|
||||||
|
i += 3
|
||||||
|
} else {
|
||||||
|
i += 2
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
append(&gp.tokens, Token{kind = .Star})
|
||||||
|
i += 2
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
append(&gp.tokens, Token{kind = .Star})
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
} else if c == '?' {
|
||||||
|
append(&gp.tokens, Token{kind = .Question})
|
||||||
|
i += 1
|
||||||
|
} else if c == '[' {
|
||||||
|
i += 1
|
||||||
|
negated := false
|
||||||
|
if i < len(pattern) && pattern[i] == '!' {
|
||||||
|
negated = true
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
|
||||||
|
ranges := make([dynamic]Range)
|
||||||
|
|
||||||
|
if i < len(pattern) && pattern[i] == ']' {
|
||||||
|
append(&ranges, Range{lo = ']', hi = ']'})
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
|
||||||
|
for i < len(pattern) && pattern[i] != ']' {
|
||||||
|
if i + 2 < len(pattern) && pattern[i + 1] == '-' && pattern[i + 2] != ']' {
|
||||||
|
append(&ranges, Range{lo = pattern[i], hi = pattern[i + 2]})
|
||||||
|
i += 3
|
||||||
|
} else {
|
||||||
|
append(&ranges, Range{lo = pattern[i], hi = pattern[i]})
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if i < len(pattern) {
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
|
||||||
|
class_idx := u16(len(gp.classes))
|
||||||
|
append(&gp.classes, Class_Data{negated = negated, ranges = ranges})
|
||||||
|
append(&gp.tokens, Token{kind = .Class, class_idx = class_idx})
|
||||||
|
} else if c == '\\' {
|
||||||
|
i += 1
|
||||||
|
if i < len(pattern) {
|
||||||
|
append(&gp.tokens, Token{kind = .Char, byte = pattern[i]})
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
append(&gp.tokens, Token{kind = .Char, byte = c})
|
||||||
|
i += 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return gp
|
||||||
|
}
|
||||||
|
|
||||||
|
match_tokens :: proc(tokens: []Token, classes: []Class_Data, ti: int, path: string, pi: int) -> bool {
|
||||||
|
if ti >= len(tokens) {
|
||||||
|
return pi == len(path)
|
||||||
|
}
|
||||||
|
|
||||||
|
tok := tokens[ti]
|
||||||
|
switch tok.kind {
|
||||||
|
case .Char:
|
||||||
|
if pi < len(path) && path[pi] == tok.byte {
|
||||||
|
return match_tokens(tokens, classes, ti + 1, path, pi + 1)
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
|
||||||
|
case .Question:
|
||||||
|
if pi < len(path) && path[pi] != '/' {
|
||||||
|
return match_tokens(tokens, classes, ti + 1, path, pi + 1)
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
|
||||||
|
case .Star:
|
||||||
|
max_end := pi
|
||||||
|
for max_end < len(path) && path[max_end] != '/' {
|
||||||
|
max_end += 1
|
||||||
|
}
|
||||||
|
for end := max_end; end >= pi; end -= 1 {
|
||||||
|
if match_tokens(tokens, classes, ti + 1, path, end) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
|
||||||
|
case .Globstar:
|
||||||
|
if ti + 1 >= len(tokens) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
if match_tokens(tokens, classes, ti + 1, path, pi) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
for end := pi + 1; end <= len(path); end += 1 {
|
||||||
|
if path[end - 1] == '/' {
|
||||||
|
if match_tokens(tokens, classes, ti + 1, path, end) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
|
||||||
|
case .Class:
|
||||||
|
if pi >= len(path) {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
cd := classes[tok.class_idx]
|
||||||
|
ch := path[pi]
|
||||||
|
in_range := false
|
||||||
|
for r in cd.ranges {
|
||||||
|
if ch >= r.lo && ch <= r.hi {
|
||||||
|
in_range = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if in_range != cd.negated {
|
||||||
|
return match_tokens(tokens, classes, ti + 1, path, pi + 1)
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
glob_match_compiled :: proc(gp: ^GlobPattern, path: string) -> bool {
|
||||||
|
tokens := gp.tokens[:]
|
||||||
|
classes := gp.classes[:]
|
||||||
|
|
||||||
|
if gp.anchored {
|
||||||
|
return match_tokens(tokens, classes, 0, path, 0)
|
||||||
|
}
|
||||||
|
|
||||||
|
if match_tokens(tokens, classes, 0, path, 0) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
for i := 1; i < len(path); i += 1 {
|
||||||
|
if path[i - 1] == '/' {
|
||||||
|
if match_tokens(tokens, classes, 0, path, i) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
glob_destroy :: proc(gp: ^GlobPattern) {
|
||||||
|
for &cd in gp.classes {
|
||||||
|
delete(cd.ranges)
|
||||||
|
}
|
||||||
|
delete(gp.classes)
|
||||||
|
delete(gp.tokens)
|
||||||
|
}
|
||||||
|
|
||||||
|
glob_match :: proc(pattern: string, path: string, anchored: bool) -> bool {
|
||||||
|
gp := glob_compile(pattern, anchored)
|
||||||
|
result := glob_match_compiled(&gp, path)
|
||||||
|
glob_destroy(&gp)
|
||||||
|
return result
|
||||||
|
}
|
||||||
24
prof.odin
24
prof.odin
@@ -1,11 +1,19 @@
|
|||||||
package findr
|
package findr
|
||||||
|
|
||||||
import "base:runtime"
|
import "base:runtime"
|
||||||
|
import "core:os"
|
||||||
import "core:prof/spall"
|
import "core:prof/spall"
|
||||||
import "core:sync"
|
import "core:sync"
|
||||||
|
import "core:sys/linux"
|
||||||
|
|
||||||
SPALL_ENABLED :: #config(SPALL_ENABLED, ODIN_DEBUG)
|
SPALL_ENABLED :: #config(SPALL_ENABLED, ODIN_DEBUG)
|
||||||
|
|
||||||
|
SPALL_MAX_BYTES :: 1 * 1024 * 1024 * 1024
|
||||||
|
|
||||||
|
_SPALL_LIMIT_MSG := "findr: spall recording reached 1 GiB limit, exiting\n"
|
||||||
|
|
||||||
|
spall_bytes_written: int
|
||||||
|
|
||||||
spall_ctx: spall.Context
|
spall_ctx: spall.Context
|
||||||
|
|
||||||
@(thread_local) spall_buffer: spall.Buffer
|
@(thread_local) spall_buffer: spall.Buffer
|
||||||
@@ -17,6 +25,14 @@ spall_enter :: proc "contextless" (
|
|||||||
loc: runtime.Source_Code_Location,
|
loc: runtime.Source_Code_Location,
|
||||||
) {
|
) {
|
||||||
when SPALL_ENABLED {
|
when SPALL_ENABLED {
|
||||||
|
if spall_buffer.head + spall.BEGIN_EVENT_MAX > len(spall_buffer.data) {
|
||||||
|
spall_bytes_written += spall_buffer.head
|
||||||
|
if spall_bytes_written >= SPALL_MAX_BYTES {
|
||||||
|
linux.write(2, transmute([]u8)_SPALL_LIMIT_MSG)
|
||||||
|
spall.buffer_flush(&spall_ctx, &spall_buffer)
|
||||||
|
os.exit(0)
|
||||||
|
}
|
||||||
|
}
|
||||||
spall._buffer_begin(&spall_ctx, &spall_buffer, "", "", loc)
|
spall._buffer_begin(&spall_ctx, &spall_buffer, "", "", loc)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -27,6 +43,14 @@ spall_exit :: proc "contextless" (
|
|||||||
loc: runtime.Source_Code_Location,
|
loc: runtime.Source_Code_Location,
|
||||||
) {
|
) {
|
||||||
when SPALL_ENABLED {
|
when SPALL_ENABLED {
|
||||||
|
if spall_buffer.head + size_of(spall.End_Event) > len(spall_buffer.data) {
|
||||||
|
spall_bytes_written += spall_buffer.head
|
||||||
|
if spall_bytes_written >= SPALL_MAX_BYTES {
|
||||||
|
linux.write(2, transmute([]u8)_SPALL_LIMIT_MSG)
|
||||||
|
spall.buffer_flush(&spall_ctx, &spall_buffer)
|
||||||
|
os.exit(0)
|
||||||
|
}
|
||||||
|
}
|
||||||
spall._buffer_end(&spall_ctx, &spall_buffer)
|
spall._buffer_end(&spall_ctx, &spall_buffer)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
236
walker.odin
236
walker.odin
@@ -4,10 +4,13 @@ import "core:fmt"
|
|||||||
import "core:os"
|
import "core:os"
|
||||||
import "core:strings"
|
import "core:strings"
|
||||||
import "core:sync"
|
import "core:sync"
|
||||||
|
import "core:sync/chan"
|
||||||
import "core:sys/linux"
|
import "core:sys/linux"
|
||||||
import "core:text/regex"
|
import "core:text/regex"
|
||||||
import "core:thread"
|
import "core:thread"
|
||||||
|
|
||||||
|
OUTPUT_BUF_SIZE :: 64 * 1024
|
||||||
|
|
||||||
IgnoreMode :: enum {
|
IgnoreMode :: enum {
|
||||||
Respected, // skip gitignored, prune ignored dirs (fd -H default)
|
Respected, // skip gitignored, prune ignored dirs (fd -H default)
|
||||||
All, // ignore .gitignore entirely, descend everywhere (fd -HI)
|
All, // ignore .gitignore entirely, descend everywhere (fd -HI)
|
||||||
@@ -21,11 +24,6 @@ WalkOptions :: struct {
|
|||||||
ignore_mode: IgnoreMode,
|
ignore_mode: IgnoreMode,
|
||||||
}
|
}
|
||||||
|
|
||||||
RawEntry :: struct {
|
|
||||||
name: string,
|
|
||||||
type: linux.Dirent_Type,
|
|
||||||
}
|
|
||||||
|
|
||||||
GIContext :: struct {
|
GIContext :: struct {
|
||||||
gi: ^Gitignore, // nil if this dir had no .gitignore
|
gi: ^Gitignore, // nil if this dir had no .gitignore
|
||||||
base_rel: string, // relative path from repo root to this dir
|
base_rel: string, // relative path from repo root to this dir
|
||||||
@@ -43,11 +41,10 @@ WalkerPool :: struct {
|
|||||||
queue: [dynamic]WorkItem,
|
queue: [dynamic]WorkItem,
|
||||||
queue_mutex: sync.Mutex,
|
queue_mutex: sync.Mutex,
|
||||||
queue_sema: sync.Atomic_Sema,
|
queue_sema: sync.Atomic_Sema,
|
||||||
results: ^[dynamic]string,
|
result_chan: chan.Chan([]u8),
|
||||||
results_mutex: sync.Mutex,
|
|
||||||
active: i64,
|
active: i64,
|
||||||
done: sync.One_Shot_Event,
|
done: sync.One_Shot_Event,
|
||||||
threads: [dynamic]^thread.Thread,
|
threads: []^thread.Thread,
|
||||||
opts: WalkOptions,
|
opts: WalkOptions,
|
||||||
pattern_re: regex.Regular_Expression,
|
pattern_re: regex.Regular_Expression,
|
||||||
has_pattern: bool,
|
has_pattern: bool,
|
||||||
@@ -56,14 +53,44 @@ WalkerPool :: struct {
|
|||||||
contexts_lock: sync.Mutex,
|
contexts_lock: sync.Mutex,
|
||||||
}
|
}
|
||||||
|
|
||||||
walk :: proc(roots: []string, results: ^[dynamic]string, opts: WalkOptions, thread_count: int) {
|
flush_buf :: proc(ch: chan.Chan([]u8), local: ^[dynamic]u8) {
|
||||||
|
if len(local) == 0 do return
|
||||||
|
batch := local[:]
|
||||||
|
local^ = make([dynamic]u8, 0, OUTPUT_BUF_SIZE)
|
||||||
|
chan.send(ch, batch)
|
||||||
|
}
|
||||||
|
|
||||||
|
append_path :: proc(buf: ^[dynamic]u8, parent, name: string, trailing_slash: bool) {
|
||||||
|
need_sep := len(parent) > 0 && parent[len(parent) - 1] != '/'
|
||||||
|
size := len(parent) + len(name) + 1
|
||||||
|
if need_sep do size += 1
|
||||||
|
if trailing_slash do size += 1
|
||||||
|
|
||||||
|
old_len := len(buf)
|
||||||
|
reserve(buf, old_len + size)
|
||||||
|
resize(buf, old_len + size)
|
||||||
|
|
||||||
|
pos := old_len
|
||||||
|
pos += copy(buf[pos:], parent)
|
||||||
|
if need_sep {buf[pos] = '/'; pos += 1}
|
||||||
|
pos += copy(buf[pos:], name)
|
||||||
|
if trailing_slash {buf[pos] = '/'; pos += 1}
|
||||||
|
buf[pos] = '\n'
|
||||||
|
}
|
||||||
|
|
||||||
|
walk_stream :: proc(
|
||||||
|
roots: []string,
|
||||||
|
result_chan: chan.Chan([]u8),
|
||||||
|
opts: WalkOptions,
|
||||||
|
thread_count: int,
|
||||||
|
) {
|
||||||
if len(roots) == 0 do return
|
if len(roots) == 0 do return
|
||||||
|
|
||||||
pool := new(WalkerPool)
|
pool := new(WalkerPool)
|
||||||
pool.queue = make([dynamic]WorkItem)
|
pool.queue = make([dynamic]WorkItem)
|
||||||
pool.results = results
|
pool.result_chan = result_chan
|
||||||
pool.active = i64(len(roots))
|
pool.active = i64(len(roots))
|
||||||
pool.threads = make([dynamic]^thread.Thread)
|
pool.threads = make([]^thread.Thread, thread_count)
|
||||||
pool.all_contexts = make([dynamic]^GIContext)
|
pool.all_contexts = make([dynamic]^GIContext)
|
||||||
pool.opts = opts
|
pool.opts = opts
|
||||||
pool.exclude_gi = nil
|
pool.exclude_gi = nil
|
||||||
@@ -100,7 +127,7 @@ walk :: proc(roots: []string, results: ^[dynamic]string, opts: WalkOptions, thre
|
|||||||
t.data = rawptr(pool)
|
t.data = rawptr(pool)
|
||||||
t.init_context = context
|
t.init_context = context
|
||||||
thread.start(t)
|
thread.start(t)
|
||||||
append(&pool.threads, t)
|
pool.threads[i] = t
|
||||||
}
|
}
|
||||||
|
|
||||||
sync.one_shot_event_wait(&pool.done)
|
sync.one_shot_event_wait(&pool.done)
|
||||||
@@ -142,14 +169,66 @@ walk :: proc(roots: []string, results: ^[dynamic]string, opts: WalkOptions, thre
|
|||||||
free(pool)
|
free(pool)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Collector_Data :: struct {
|
||||||
|
ch: chan.Chan([]u8),
|
||||||
|
results: ^[dynamic]string,
|
||||||
|
}
|
||||||
|
|
||||||
|
collect_worker :: proc(t: ^thread.Thread) {
|
||||||
|
data := cast(^Collector_Data)t.data
|
||||||
|
for {
|
||||||
|
batch, ok := chan.recv(data.ch)
|
||||||
|
if !ok do break
|
||||||
|
start := 0
|
||||||
|
for i in 0 ..< len(batch) {
|
||||||
|
if batch[i] == '\n' {
|
||||||
|
if i > start {
|
||||||
|
s, _ := strings.clone(string(batch[start:i]))
|
||||||
|
append(data.results, s)
|
||||||
|
}
|
||||||
|
start = i + 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
delete(batch)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
walk :: proc(roots: []string, results: ^[dynamic]string, opts: WalkOptions, thread_count: int) {
|
||||||
|
if len(roots) == 0 do return
|
||||||
|
|
||||||
|
ch, _ := chan.create(chan.Chan([]u8), max(2 * thread_count, 2), context.allocator)
|
||||||
|
defer chan.destroy(ch)
|
||||||
|
|
||||||
|
data := new(Collector_Data)
|
||||||
|
data.ch = ch
|
||||||
|
data.results = results
|
||||||
|
|
||||||
|
collector := thread.create(collect_worker)
|
||||||
|
collector.data = rawptr(data)
|
||||||
|
collector.init_context = context
|
||||||
|
thread.start(collector)
|
||||||
|
|
||||||
|
walk_stream(roots, ch, opts, thread_count)
|
||||||
|
|
||||||
|
chan.close(ch)
|
||||||
|
thread.join(collector)
|
||||||
|
thread.destroy(collector)
|
||||||
|
free(data)
|
||||||
|
}
|
||||||
|
|
||||||
walk_worker :: proc(t: ^thread.Thread) {
|
walk_worker :: proc(t: ^thread.Thread) {
|
||||||
pool := cast(^WalkerPool)t.data
|
pool := cast(^WalkerPool)t.data
|
||||||
|
|
||||||
prof_thread_init("walker")
|
prof_thread_init("walker")
|
||||||
defer prof_thread_destroy()
|
defer prof_thread_destroy()
|
||||||
|
|
||||||
local_results := make([dynamic]string, 0, 256)
|
local_buf := make([dynamic]u8, 0, OUTPUT_BUF_SIZE)
|
||||||
defer delete(local_results)
|
defer {
|
||||||
|
if len(local_buf) > 0 {
|
||||||
|
flush_buf(pool.result_chan, &local_buf)
|
||||||
|
}
|
||||||
|
delete(local_buf)
|
||||||
|
}
|
||||||
|
|
||||||
for {
|
for {
|
||||||
sync.atomic_sema_wait(&pool.queue_sema)
|
sync.atomic_sema_wait(&pool.queue_sema)
|
||||||
@@ -167,30 +246,36 @@ walk_worker :: proc(t: ^thread.Thread) {
|
|||||||
ordered_remove(&pool.queue, last)
|
ordered_remove(&pool.queue, last)
|
||||||
sync.mutex_unlock(&pool.queue_mutex)
|
sync.mutex_unlock(&pool.queue_mutex)
|
||||||
|
|
||||||
process_dir(pool, item, &local_results)
|
process_dir(pool, item, &local_buf)
|
||||||
delete(item.path)
|
delete(item.path)
|
||||||
if len(item.rel) > 0 {delete(item.rel)}
|
if len(item.rel) > 0 {delete(item.rel)}
|
||||||
|
|
||||||
|
if len(local_buf) >= OUTPUT_BUF_SIZE {
|
||||||
|
flush_buf(pool.result_chan, &local_buf)
|
||||||
|
}
|
||||||
|
|
||||||
old := sync.atomic_sub_explicit(&pool.active, 1, .Release)
|
old := sync.atomic_sub_explicit(&pool.active, 1, .Release)
|
||||||
if old == 1 {
|
if old == 1 {
|
||||||
sync.one_shot_event_signal(&pool.done)
|
sync.one_shot_event_signal(&pool.done)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(local_results) > 0 {
|
|
||||||
sync.mutex_lock(&pool.results_mutex)
|
|
||||||
for res in local_results {
|
|
||||||
append(pool.results, res)
|
|
||||||
}
|
|
||||||
sync.mutex_unlock(&pool.results_mutex)
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
process_dir :: proc(pool: ^WalkerPool, item: WorkItem, local_results: ^[dynamic]string) {
|
process_dir :: proc(pool: ^WalkerPool, item: WorkItem, local_buf: ^[dynamic]u8) {
|
||||||
dir_path := item.path
|
dir_path := item.path
|
||||||
|
|
||||||
|
cpath := strings.clone_to_cstring(dir_path)
|
||||||
|
if cpath == nil do return
|
||||||
|
defer delete(cpath)
|
||||||
|
|
||||||
|
fd, open_err := linux.open(cpath, {.DIRECTORY, .CLOEXEC})
|
||||||
|
if open_err != .NONE do return
|
||||||
|
defer linux.close(fd)
|
||||||
|
|
||||||
has_git := false
|
has_git := false
|
||||||
entries := read_dir_entries(dir_path, &has_git)
|
if pool.opts.ignore_mode != .All {
|
||||||
defer free_entries(&entries)
|
has_git = has_git_dir(fd)
|
||||||
|
}
|
||||||
|
|
||||||
gi_ctx := item.gi_ctx
|
gi_ctx := item.gi_ctx
|
||||||
rel := item.rel
|
rel := item.rel
|
||||||
@@ -202,7 +287,10 @@ process_dir :: proc(pool: ^WalkerPool, item: WorkItem, local_results: ^[dynamic]
|
|||||||
|
|
||||||
child_in_repo := has_git || item.in_repo
|
child_in_repo := has_git || item.in_repo
|
||||||
|
|
||||||
gi := load_ignore_patterns(dir_path, child_in_repo)
|
gi: ^Gitignore = nil
|
||||||
|
if pool.opts.ignore_mode != .All {
|
||||||
|
gi = load_ignore_patterns(dir_path, child_in_repo)
|
||||||
|
}
|
||||||
if gi != nil {
|
if gi != nil {
|
||||||
new_ctx := new(GIContext)
|
new_ctx := new(GIContext)
|
||||||
new_ctx.gi = gi
|
new_ctx.gi = gi
|
||||||
@@ -218,23 +306,31 @@ process_dir :: proc(pool: ^WalkerPool, item: WorkItem, local_results: ^[dynamic]
|
|||||||
gi_ctx = new_ctx
|
gi_ctx = new_ctx
|
||||||
}
|
}
|
||||||
|
|
||||||
|
buf: [32 * 1024]u8
|
||||||
rel_buf: [4096]u8
|
rel_buf: [4096]u8
|
||||||
|
|
||||||
for entry in entries {
|
for {
|
||||||
if entry.name == ".git" do continue
|
n, errno := linux.getdents(fd, buf[:])
|
||||||
|
if n <= 0 || errno != .NONE do break
|
||||||
|
|
||||||
is_dir := entry.type == .DIR
|
offs := 0
|
||||||
is_nondir := entry.type != .DIR
|
for d in linux.dirent_iterate_buf(buf[:n], &offs) {
|
||||||
|
name := linux.dirent_name(d)
|
||||||
|
if name == "." || name == ".." do continue
|
||||||
|
if name == ".git" do continue
|
||||||
|
|
||||||
if pool.exclude_gi != nil && is_ignored(pool.exclude_gi, entry.name, is_dir) {
|
is_dir := d.type == .DIR
|
||||||
|
is_nondir := d.type != .DIR
|
||||||
|
|
||||||
|
if pool.exclude_gi != nil && is_ignored(pool.exclude_gi, name, is_dir) {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
if !pool.opts.include_hidden && len(entry.name) > 0 && entry.name[0] == '.' {
|
if !pool.opts.include_hidden && len(name) > 0 && name[0] == '.' {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
entry_rel := build_rel(rel_buf[:], rel, entry.name)
|
entry_rel := build_rel(rel_buf[:], rel, name)
|
||||||
|
|
||||||
ignored := false
|
ignored := false
|
||||||
if gi_ctx != nil && pool.opts.ignore_mode != .All {
|
if gi_ctx != nil && pool.opts.ignore_mode != .All {
|
||||||
@@ -249,19 +345,26 @@ process_dir :: proc(pool: ^WalkerPool, item: WorkItem, local_results: ^[dynamic]
|
|||||||
}
|
}
|
||||||
|
|
||||||
if is_dir {
|
if is_dir {
|
||||||
if should_emit && matches_pattern(pool, entry.name) {
|
if should_emit && matches_pattern(pool, name) {
|
||||||
dir_path_out := join_path_dir(dir_path, entry.name)
|
append_path(local_buf, dir_path, name, true)
|
||||||
append(local_results, dir_path_out)
|
|
||||||
}
|
}
|
||||||
if !ignored {
|
if !ignored {
|
||||||
child_rel, _ := strings.clone(entry_rel)
|
child_rel, _ := strings.clone(entry_rel)
|
||||||
child_path := join_path(dir_path, entry.name)
|
child_path := join_path(dir_path, name)
|
||||||
push_work(pool, WorkItem{path = child_path, rel = child_rel, gi_ctx = gi_ctx, in_repo = child_in_repo})
|
push_work(
|
||||||
|
pool,
|
||||||
|
WorkItem {
|
||||||
|
path = child_path,
|
||||||
|
rel = child_rel,
|
||||||
|
gi_ctx = gi_ctx,
|
||||||
|
in_repo = child_in_repo,
|
||||||
|
},
|
||||||
|
)
|
||||||
}
|
}
|
||||||
} else if is_nondir {
|
} else if is_nondir {
|
||||||
if should_emit && matches_pattern(pool, entry.name) {
|
if should_emit && matches_pattern(pool, name) {
|
||||||
full_path := join_path(dir_path, entry.name)
|
append_path(local_buf, dir_path, name, false)
|
||||||
append(local_results, full_path)
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -285,7 +388,8 @@ check_chain :: proc(ctx: ^GIContext, entry_rel: string, is_dir: bool) -> bool {
|
|||||||
relative_to :: proc(entry_rel, base_rel: string) -> string {
|
relative_to :: proc(entry_rel, base_rel: string) -> string {
|
||||||
if len(base_rel) == 0 do return entry_rel
|
if len(base_rel) == 0 do return entry_rel
|
||||||
prefix_len := len(base_rel)
|
prefix_len := len(base_rel)
|
||||||
if len(entry_rel) > prefix_len && entry_rel[prefix_len] == '/' &&
|
if len(entry_rel) > prefix_len &&
|
||||||
|
entry_rel[prefix_len] == '/' &&
|
||||||
strings.has_prefix(entry_rel, base_rel) {
|
strings.has_prefix(entry_rel, base_rel) {
|
||||||
return entry_rel[prefix_len + 1:]
|
return entry_rel[prefix_len + 1:]
|
||||||
}
|
}
|
||||||
@@ -318,46 +422,13 @@ push_work :: proc(pool: ^WalkerPool, item: WorkItem) {
|
|||||||
sync.atomic_sema_post(&pool.queue_sema)
|
sync.atomic_sema_post(&pool.queue_sema)
|
||||||
}
|
}
|
||||||
|
|
||||||
read_dir_entries :: proc(dir_path: string, has_git: ^bool) -> [dynamic]RawEntry {
|
has_git_dir :: proc(fd: linux.Fd) -> bool {
|
||||||
entries := make([dynamic]RawEntry)
|
git_fd, err := linux.openat(fd, ".git", {.DIRECTORY, .CLOEXEC})
|
||||||
|
if err == .NONE {
|
||||||
cpath := strings.clone_to_cstring(dir_path)
|
linux.close(git_fd)
|
||||||
if cpath == nil do return entries
|
return true
|
||||||
|
|
||||||
fd, err := linux.open(cpath, {.DIRECTORY, .CLOEXEC})
|
|
||||||
delete(cpath)
|
|
||||||
if err != .NONE do return entries
|
|
||||||
|
|
||||||
buf: [8192]u8
|
|
||||||
has_git^ = false
|
|
||||||
|
|
||||||
for {
|
|
||||||
n, errno := linux.getdents(fd, buf[:])
|
|
||||||
if n <= 0 || errno != .NONE do break
|
|
||||||
|
|
||||||
offs := 0
|
|
||||||
for d in linux.dirent_iterate_buf(buf[:n], &offs) {
|
|
||||||
name := linux.dirent_name(d)
|
|
||||||
if name == "." || name == ".." do continue
|
|
||||||
|
|
||||||
if name == ".git" && d.type == .DIR {
|
|
||||||
has_git^ = true
|
|
||||||
}
|
}
|
||||||
|
return false
|
||||||
cloned := strings.clone(name)
|
|
||||||
append(&entries, RawEntry{name = cloned, type = d.type})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
linux.close(fd)
|
|
||||||
return entries
|
|
||||||
}
|
|
||||||
|
|
||||||
free_entries :: proc(entries: ^[dynamic]RawEntry) {
|
|
||||||
for &entry in entries {
|
|
||||||
delete(entry.name)
|
|
||||||
}
|
|
||||||
delete(entries^)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
load_ignore_patterns :: proc(dir_path: string, in_repo: bool) -> ^Gitignore {
|
load_ignore_patterns :: proc(dir_path: string, in_repo: bool) -> ^Gitignore {
|
||||||
@@ -422,3 +493,4 @@ join_path_dir :: proc(parent, child: string) -> string {
|
|||||||
buf[pos] = '/'
|
buf[pos] = '/'
|
||||||
return string(buf)
|
return string(buf)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user