* shuffle3-lean Redegisn/upgrade of =shuffle3= * Goals - [X] Functioning in-place shuffle/unshuffle - [X] Shuffle - [X] Unshuffle - [X] Usable in-place s/us from command line - [X] Shuffle - [X] Unshuffle - [ ] Functioning out-of-place/in-memory shuffle/unshuffle - [ ] Shuffle - [ ] Unshuffle - [ ] Usable out-of-place s/us from command line - [ ] Shuffle - [ ] Unshuffle ** NO compatibility with =shuffle3= =shuffle3='s ~drng~ PRNG algorithm uses an outdated global state backend. We don't want to reuse this. As a result, output from =shuffle3= and =shuffle3-lean= is different. * Improvements - *~70-80x* speedup from shuffle3 1.0 - Huge reduction in syscalls - Takes advantage of the kernel's fs cache - Can properly handle large files without core dumping - Doesn't dump huge amounts of trash onto each stack frame ** Performance [[https://github.com/sharkdp/hyperfine][hyperfine]] reports a *700-800%* speedup over =v1=. It's easy to see why. *** V1 flamegraph V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead. [[./profiling/release-flame-old.svg]] *** V2 flamegraph Whereas V2 uses a single ~mmap()~. [[./profiling/release-flame.svg]] ** Memory usage The [[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]] graph for =v1= shows extremely inefficient stack usage. [[./profiling/old-mem.png]] ( the green is supposed to be a line, not a bar) This is due to how the unshuffler buffers RNG results. =v1= naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled. It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage. This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed. *** V2 improvement The ~memusage~ graph for =v2= is a lot more sane. [[./profiling/mem.png]] ~v2~ instead allocates this buffer on the heap. Note the stable stack and heap usage. * Todo - [X] impl rng - [X] impl shuffling - [ ] impl out-of-place shuffling - [-] arg parsing and dispatch - [X] simple parsing - [ ] complex parsing