shuffle3/TODO.org

* shuffle3-lean
  Redegisn/upgrade of =shuffle3=

* Goals
  - [X] Functioning in-place shuffle/unshuffle
    - [X] Shuffle
    - [X] Unshuffle
  - [X] Usable in-place s/us from command line
    - [X] Shuffle
    - [X] Unshuffle
  - [ ] Functioning out-of-place/in-memory shuffle/unshuffle
    - [ ] Shuffle
    - [ ] Unshuffle
  - [ ] Usable out-of-place s/us from command line
    - [ ] Shuffle
    - [ ] Unshuffle

** NO compatibility with =shuffle3=
 =shuffle3='s ~drng~ PRNG algorithm uses an outdated global state backend. We don't want to reuse this.
   As a result, output from =shuffle3= and =shuffle3-lean= is different.

* Improvements
  - *~70-80x* speedup from shuffle3 1.0
  - Huge reduction in syscalls
  - Takes advantage of the kernel's fs cache
  - Can properly handle large files without core dumping
  - Doesn't dump huge amounts of trash onto each stack frame

** Performance
   [[https://github.com/sharkdp/hyperfine][hyperfine]]  reports a *700-800%* speedup over =v1=.

   It's easy to see why.
*** V1 flamegraph
    V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.
    [[./profiling/release-flame-old.svg]]
*** V2 flamegraph
    Whereas V2 uses a single ~mmap()~.
    [[./profiling/release-flame.svg]]
    
** Memory usage
   The [[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]] graph for =v1= shows extremely inefficient stack usage.
   [[./profiling/old-mem.png]]
   ( the green is supposed to be a line, not a bar)
   This is due to how the unshuffler buffers RNG results.

 =v1= naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled.
   It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.

   This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.

*** V2 improvement
    The ~memusage~ graph for =v2= is a lot more sane.
    [[./profiling/mem.png]]
    
 ~v2~ instead allocates this buffer on the heap. Note the stable stack and heap usage.
* Todo
  - [X] impl rng
  - [X] impl shuffling
  - [ ] impl out-of-place shuffling
  - [-] arg parsing and dispatch
    - [X] simple parsing
    - [ ] complex parsing
ready for production 4 years ago			`* shuffle3-lean`
			`Redegisn/upgrade of =shuffle3=`

			`* Goals`
			`- [X] Functioning in-place shuffle/unshuffle`
			`- [X] Shuffle`
			`- [X] Unshuffle`
			`- [X] Usable in-place s/us from command line`
			`- [X] Shuffle`
			`- [X] Unshuffle`
			`- [ ] Functioning out-of-place/in-memory shuffle/unshuffle`
			`- [ ] Shuffle`
			`- [ ] Unshuffle`
			`- [ ] Usable out-of-place s/us from command line`
			`- [ ] Shuffle`
			`- [ ] Unshuffle`

			`** NO compatibility with =shuffle3=`
			`=shuffle3='s ~drng~ PRNG algorithm uses an outdated global state backend. We don't want to reuse this.`
			`As a result, output from =shuffle3= and =shuffle3-lean= is different.`
fix hang on small files 4 years ago
			`* Improvements`
			`- ~70-80x speedup from shuffle3 1.0`
ready for production 4 years ago			`- Huge reduction in syscalls`
			`- Takes advantage of the kernel's fs cache`
fix hang on small files 4 years ago			`- Can properly handle large files without core dumping`
ready for production 4 years ago			`- Doesn't dump huge amounts of trash onto each stack frame`
fix hang on small files 4 years ago
update TODO 4 years ago			`** Performance`
update README 4 years ago			`[[https://github.com/sharkdp/hyperfine][hyperfine]] reports a 700-800% speedup over =v1=.`
update TODO 4 years ago
			`It's easy to see why.`
			`*** V1 flamegraph`
			`V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.`
			`[[./profiling/release-flame-old.svg]]`
			`*** V2 flamegraph`
			`Whereas V2 uses a single ~mmap()~.`
			`[[./profiling/release-flame.svg]]`

			`** Memory usage`
update README 4 years ago			`The [[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]] graph for =v1= shows extremely inefficient stack usage.`
update TODO 4 years ago			`[[./profiling/old-mem.png]]`
			`( the green is supposed to be a line, not a bar)`
			`This is due to how the unshuffler buffers RNG results.`

			`=v1= naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled.`
			`It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.`

			`This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.`

			`*** V2 improvement`
			`The ~memusage~ graph for =v2= is a lot more sane.`
			`[[./profiling/mem.png]]`

			`~v2~ instead allocates this buffer on the heap. Note the stable stack and heap usage.`
todo 4 years ago			`* Todo`
ready for production 4 years ago			`- [X] impl rng`
			`- [X] impl shuffling`
			`- [ ] impl out-of-place shuffling`
			`- [-] arg parsing and dispatch`
			`- [X] simple parsing`
			`- [ ] complex parsing`