You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
shuffle3/TODO.org

63 lines
2.3 KiB

* shuffle3-lean
Redegisn/upgrade of =shuffle3=
* Goals
- [X] Functioning in-place shuffle/unshuffle
- [X] Shuffle
- [X] Unshuffle
- [X] Usable in-place s/us from command line
- [X] Shuffle
- [X] Unshuffle
- [ ] Functioning out-of-place/in-memory shuffle/unshuffle
- [ ] Shuffle
- [ ] Unshuffle
- [ ] Usable out-of-place s/us from command line
- [ ] Shuffle
- [ ] Unshuffle
** NO compatibility with =shuffle3= =shuffle3='s ~drng~ PRNG algorithm uses an outdated global state backend. We don't want to reuse this.
As a result, output from =shuffle3= and =shuffle3-lean= is different.
* Improvements
- *~70-80x* speedup from shuffle3 1.0
- Huge reduction in syscalls
- Takes advantage of the kernel's fs cache
- Can properly handle large files without core dumping
- Doesn't dump huge amounts of trash onto each stack frame
** Performance
[[https://github.com/sharkdp/hyperfine][hyperfine]] reports a *700-800%* speedup over =v1=.
It's easy to see why.
*** V1 flamegraph
V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.
[[./profiling/release-flame-old.svg]]
*** V2 flamegraph
Whereas V2 uses a single ~mmap()~.
[[./profiling/release-flame.svg]]
** Memory usage
The [[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]] graph for =v1= shows extremely inefficient stack usage.
[[./profiling/old-mem.png]]
( the green is supposed to be a line, not a bar)
This is due to how the unshuffler buffers RNG results. =v1= naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled.
It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.
This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.
*** V2 improvement
The ~memusage~ graph for =v2= is a lot more sane.
[[./profiling/mem.png]] ~v2~ instead allocates this buffer on the heap. Note the stable stack and heap usage.
* Todo
- [X] impl rng
- [X] impl shuffling
- [ ] impl out-of-place shuffling
- [-] arg parsing and dispatch
- [X] simple parsing
- [ ] complex parsing