You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
shuffle3/TODO.org

2.3 KiB

shuffle3-lean

Redegisn/upgrade of shuffle3

Goals

  • Functioning in-place shuffle/unshuffle

    • Shuffle
    • Unshuffle
  • Usable in-place s/us from command line

    • Shuffle
    • Unshuffle
  • Functioning out-of-place/in-memory shuffle/unshuffle

    • Shuffle
    • Unshuffle
  • Usable out-of-place s/us from command line

    • Shuffle
    • Unshuffle

NO compatibility with shuffle3

shuffle3's drng PRNG algorithm uses an outdated global state backend. We don't want to reuse this. As a result, output from shuffle3 and shuffle3-lean is different.

Improvements

  • ~70-80x speedup from shuffle3 1.0
  • Huge reduction in syscalls
  • Takes advantage of the kernel's fs cache
  • Can properly handle large files without core dumping
  • Doesn't dump huge amounts of trash onto each stack frame

Performance

hyperfine reports a 700-800% speedup over v1.

It's easy to see why.

V1 flamegraph

V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead. /flanchan/shuffle3/src/commit/16ae82f05d5deec2d56b29e14322309cf2888056/profiling/release-flame-old.svg

V2 flamegraph

Whereas V2 uses a single mmap(). /flanchan/shuffle3/src/commit/16ae82f05d5deec2d56b29e14322309cf2888056/profiling/release-flame.svg

Memory usage

The memusage graph for v1 shows extremely inefficient stack usage. /flanchan/shuffle3/src/commit/16ae82f05d5deec2d56b29e14322309cf2888056/profiling/old-mem.png ( the green is supposed to be a line, not a bar) This is due to how the unshuffler buffers RNG results.

v1 naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled. It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.

This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.

V2 improvement

The memusage graph for v2 is a lot more sane. /flanchan/shuffle3/src/commit/16ae82f05d5deec2d56b29e14322309cf2888056/profiling/mem.png

v2 instead allocates this buffer on the heap. Note the stable stack and heap usage.

Todo

  • impl rng
  • impl shuffling
  • impl out-of-place shuffling
  • arg parsing and dispatch

    • simple parsing
    • complex parsing