|
|
|
@ -19,6 +19,43 @@ $ shuffle3 -u file
|
|
|
|
|
## Other options
|
|
|
|
|
Run with `-h` for more options.
|
|
|
|
|
|
|
|
|
|
# Improvements from `v1`
|
|
|
|
|
* **~70-80x** speedup from shuffle3 v1.0
|
|
|
|
|
* Huge reduction in syscalls
|
|
|
|
|
* Takes advantage of the kernel's fs cache
|
|
|
|
|
* Can properly handle large files without core dumping
|
|
|
|
|
* Doesn't dump huge amounts of trash onto each stack frame
|
|
|
|
|
|
|
|
|
|
## Performance
|
|
|
|
|
[https://github.com/sharkdp/hyperfine](hyperfine) reports a **700-800%** speedup over `v1`.
|
|
|
|
|
It's easy to see why.
|
|
|
|
|
|
|
|
|
|
### V1 flamegraph
|
|
|
|
|
V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.
|
|
|
|
|
data:image/s3,"s3://crabby-images/51be8/51be85ec2cdd548cba6c3baba2289061e5eda0f6" alt=""
|
|
|
|
|
|
|
|
|
|
### V2 flamegraph
|
|
|
|
|
Whereas V2 uses a single `mmap()`.
|
|
|
|
|
data:image/s3,"s3://crabby-images/9848a/9848aab15140db2f4d9a94eaa171dfdb15d2397a" alt=""
|
|
|
|
|
|
|
|
|
|
## Memory usage
|
|
|
|
|
The [https://www.systutorials.com/docs/linux/man/1-memusage/](memusage) graph for =v1= shows extremely inefficient stack usage.
|
|
|
|
|
data:image/s3,"s3://crabby-images/f6d3d/f6d3d6e12b024095ac34e4ae76a88ad716dee846" alt=""
|
|
|
|
|
( the green is supposed to be a line, not a bar )
|
|
|
|
|
This is due to how the unshuffler buffers RNG results.
|
|
|
|
|
|
|
|
|
|
`v1` naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled.
|
|
|
|
|
It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.
|
|
|
|
|
|
|
|
|
|
This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.
|
|
|
|
|
|
|
|
|
|
### V2 improvement
|
|
|
|
|
The `memusage` graph for `v2` is a lot more sane.
|
|
|
|
|
data:image/s3,"s3://crabby-images/de43f/de43f0919e31b6a2b0c47e0c1a2d7def424cb78d" alt=""
|
|
|
|
|
|
|
|
|
|
`v2` instead allocates this buffer on the heap. Note the stable stack and heap usage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Building
|
|
|
|
|
Run `make` to build the normal binary. It will output to `shuffle3-release`.
|
|
|
|
|
|
|
|
|
|