3 pass byte shuffler/unshuffler
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Avril f648646317
fix make failing on clean builds
6 months ago
include added _FS_SPILL_BUFFER=MAP 7 months ago
old Update to 2.0! 7 months ago
profiling memusage on arm 7 months ago
src added _FS_SPILL_BUFFER=MAP 7 months ago
.gitignore update TODO 7 months ago
LICENSE Create LICENSE 7 months ago
Makefile fix make failing on clean builds 6 months ago
README.md added _FS_SPILL_BUFFER=MAP 7 months ago
TODO.org update README 7 months ago
test.sh improve test script and makefile 7 months ago


shuffle3-lean - Improved 3 stage byte shuffler

Deterministically and reversably shuffle a file's bytes around.


Shuffle a file in place

$ shuffle3 -s file


Unshuffle a file in place

$ shuffle3 -u file

Other options

Run with -h for more options.

Improvements from v1

  • ~70-80x speedup from shuffle3 v1.0
  • Huge reduction in syscalls
  • Takes advantage of the kernel's fs cache
  • Can properly handle large files without core dumping
  • Doesn't dump huge amounts of trash onto each stack frame


hyperfine reports a 700-800% speedup over v1. It's easy to see why.

V1 flamegraph

V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.

V2 flamegraph

Whereas V2 uses a single mmap().

Memory usage

The memusage graph for v1 shows extremely inefficient stack usage. ( the green is supposed to be a line, not a bar! )

This is due to how the unshuffler buffers RNG results.

v1 naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled. It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.

This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.

V2 improvement

The memusage graph for v2 is a lot more sane.

v2 instead allocates this buffer on the heap. Note the stable stack and heap usage.


Run make to build the normal binary. It will output to shuffle3-release.

Release target

The release (default) target uses the variables RELEASE_CFLAGS, RELEASE_CXXFLAGS and RELEASE_LDFLAGS to specify opitimisations, as well as the OPT_FLAGS variable. These can be set by you if you wish.


The default OPT_FLAGS contains the flag -march=native. This may be underisable for you, in which case set the variable or modify the makefile to remove it.

Debug target

To build with debug information, run make debug. Extra debug flags can be provided with the DEBUG_CFLAGS, DEBUG_CXXFLAGS and DEBUG_LDFLAGS variables which have default values in the Makefile.

The build and unstripped binary will be shuffle3-debug.

PGO target

To build with Profile Guided Optimisation run make pgo, the stripped and optimised binary will be output to shuffle3-pgo.


Before switching between release and debug targets, remember to run make clean. To disable stripping of release build binaries, run with make STRIP=: release

Compile-time flags

There are some build-time flags you can switch while building by appending to the FEATURE_FLAGS variable.

Flag Description
DEBUG Pretend we're building a debug release even though we're not.
_FS_SPILL_BUFFER Spill buffers into a file if they grow over a threshold. Can cause massive slowdowns but prevent OOMs while unshuffling on systems with low available memory. See shuffle3.h for more details
_FS_SPILL_BUFFER=DYN Same as above except allocates memory dynamically. Might be faster.
_FS_SPILL_BUFFER=MAP Same as above except it calls fallocate() and mmap() to prodive a buffer of the full size needed. Is usually the fastest of the options for _FS_SPILL_BUFFER and is preferrable if possible.

Gentoo ebuild

There is a gentoo ebuild for this project in the overlay test-overlay. direct link


GPL'd with <3