diff --git a/README.md b/README.md index 64d7007..07a1855 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,43 @@ $ shuffle3 -u file ## Other options Run with `-h` for more options. +# Improvements from `v1` +* **~70-80x** speedup from shuffle3 v1.0 +* Huge reduction in syscalls +* Takes advantage of the kernel's fs cache +* Can properly handle large files without core dumping +* Doesn't dump huge amounts of trash onto each stack frame + +## Performance +[https://github.com/sharkdp/hyperfine](hyperfine) reports a **700-800%** speedup over `v1`. +It's easy to see why. + +### V1 flamegraph +V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead. +![](./profiling/release-flame-old.svg) + +### V2 flamegraph +Whereas V2 uses a single `mmap()`. +![](./profiling/release-flame.svg) + +## Memory usage +The [https://www.systutorials.com/docs/linux/man/1-memusage/](memusage) graph for =v1= shows extremely inefficient stack usage. +![](./profiling/old-mem.png) +( the green is supposed to be a line, not a bar ) +This is due to how the unshuffler buffers RNG results. + +`v1` naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled. +It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage. + +This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed. + +### V2 improvement +The `memusage` graph for `v2` is a lot more sane. +![](./profiling/mem.png) + +`v2` instead allocates this buffer on the heap. Note the stable stack and heap usage. + + # Building Run `make` to build the normal binary. It will output to `shuffle3-release`. diff --git a/TODO.org b/TODO.org index 5d9dd3e..c235a12 100644 --- a/TODO.org +++ b/TODO.org @@ -27,7 +27,7 @@ - Doesn't dump huge amounts of trash onto each stack frame ** Performance - ~[[https://github.com/sharkdp/hyperfine][hyperfine]]~ reports a *700-800%* speedup over =v1=. + [[https://github.com/sharkdp/hyperfine][hyperfine]] reports a *700-800%* speedup over =v1=. It's easy to see why. *** V1 flamegraph @@ -38,7 +38,7 @@ [[./profiling/release-flame.svg]] ** Memory usage - The ~[[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]]~ graph for =v1= shows extremely inefficient stack usage. + The [[https://www.systutorials.com/docs/linux/man/1-memusage/][memusage]] graph for =v1= shows extremely inefficient stack usage. [[./profiling/old-mem.png]] ( the green is supposed to be a line, not a bar) This is due to how the unshuffler buffers RNG results.