2.3 KiB
shuffle3-lean
Redegisn/upgrade of shuffle3
Goals
-
Functioning in-place shuffle/unshuffle
- Shuffle
- Unshuffle
-
Usable in-place s/us from command line
- Shuffle
- Unshuffle
-
Functioning out-of-place/in-memory shuffle/unshuffle
- Shuffle
- Unshuffle
-
Usable out-of-place s/us from command line
- Shuffle
- Unshuffle
NO compatibility with shuffle3
shuffle3
's drng
PRNG algorithm uses an outdated global state backend. We don't want to reuse this.
As a result, output from shuffle3
and shuffle3-lean
is different.
Improvements
- ~70-80x speedup from shuffle3 1.0
- Huge reduction in syscalls
- Takes advantage of the kernel's fs cache
- Can properly handle large files without core dumping
- Doesn't dump huge amounts of trash onto each stack frame
Performance
hyperfine reports a 700-800% speedup over v1
.
It's easy to see why.
V1 flamegraph
V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.
V2 flamegraph
Whereas V2 uses a single mmap()
.
Memory usage
The memusage graph for v1
shows extremely inefficient stack usage.
( the green is supposed to be a line, not a bar)
This is due to how the unshuffler buffers RNG results.
v1
naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled.
It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.
This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.
V2 improvement
The memusage
graph for v2
is a lot more sane.
v2
instead allocates this buffer on the heap. Note the stable stack and heap usage.
Todo
- impl rng
- impl shuffling
- impl out-of-place shuffling
-
arg parsing and dispatch
- simple parsing
- complex parsing