|
|
|
* shuffle3-lean
|
|
|
|
Redegisn/upgrade of =shuffle3=
|
|
|
|
|
|
|
|
* Goals
|
|
|
|
- [X] Functioning in-place shuffle/unshuffle
|
|
|
|
- [X] Shuffle
|
|
|
|
- [X] Unshuffle
|
|
|
|
- [X] Usable in-place s/us from command line
|
|
|
|
- [X] Shuffle
|
|
|
|
- [X] Unshuffle
|
|
|
|
- [ ] Functioning out-of-place/in-memory shuffle/unshuffle
|
|
|
|
- [ ] Shuffle
|
|
|
|
- [ ] Unshuffle
|
|
|
|
- [ ] Usable out-of-place s/us from command line
|
|
|
|
- [ ] Shuffle
|
|
|
|
- [ ] Unshuffle
|
|
|
|
|
|
|
|
** NO compatibility with =shuffle3=
|
|
|
|
=shuffle3='s ~drng~ PRNG algorithm uses an outdated global state backend. We don't want to reuse this.
|
|
|
|
As a result, output from =shuffle3= and =shuffle3-lean= is different.
|
|
|
|
|
|
|
|
* Improvements
|
|
|
|
- *~70-80x* speedup from shuffle3 1.0
|
|
|
|
- Huge reduction in syscalls
|
|
|
|
- Takes advantage of the kernel's fs cache
|
|
|
|
- Can properly handle large files without core dumping
|
|
|
|
- Doesn't dump huge amounts of trash onto each stack frame
|
|
|
|
|
|
|
|
** Performance
|
|
|
|
[[https://github.com/sharkdp/hyperfine][hyperfine]] reports a *700-800%* speedup over =v1=.
|
|
|
|
|
|
|
|
It's easy to see why.
|
|
|
|
*** V1 flamegraph
|
|
|
|
V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.
|
|
|
|
[[./profiling/release-flame-old.svg]]
|
|
|
|
*** V2 flamegraph
|
|
|
|
Whereas V2 uses a single ~mmap()~.
|
|
|
|
[[./profiling/release-flame.svg]]
|
|
|
|
|
|
|
|
** Memory usage
|
|
|
|
The [[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]] graph for =v1= shows extremely inefficient stack usage.
|
|
|
|
[[./profiling/old-mem.png]]
|
|
|
|
( the green is supposed to be a line, not a bar)
|
|
|
|
This is due to how the unshuffler buffers RNG results.
|
|
|
|
|
|
|
|
=v1= naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled.
|
|
|
|
It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.
|
|
|
|
|
|
|
|
This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.
|
|
|
|
|
|
|
|
*** V2 improvement
|
|
|
|
The ~memusage~ graph for =v2= is a lot more sane.
|
|
|
|
[[./profiling/mem.png]]
|
|
|
|
|
|
|
|
~v2~ instead allocates this buffer on the heap. Note the stable stack and heap usage.
|
|
|
|
* Todo
|
|
|
|
- [X] impl rng
|
|
|
|
- [X] impl shuffling
|
|
|
|
- [ ] impl out-of-place shuffling
|
|
|
|
- [-] arg parsing and dispatch
|
|
|
|
- [X] simple parsing
|
|
|
|
- [ ] complex parsing
|