3 pass byte shuffler/unshuffler
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Go to file
Avril 295da7df02
Started branch "progress", for working on timed-out progress indicator. Current build succeeds but does not use progress indicators at all.
2 years ago
include Started branch "progress", for working on timed-out progress indicator. Current build succeeds but does not use progress indicators at all. 2 years ago
old Update to 2.0! 4 years ago
profiling memusage on arm 4 years ago
src Started branch "progress", for working on timed-out progress indicator. Current build succeeds but does not use progress indicators at all. 2 years ago
.gitignore update TODO 4 years ago
LICENSE Create LICENSE 4 years ago
Makefile Started branch "progress", for working on timed-out progress indicator. Current build succeeds but does not use progress indicators at all. 2 years ago
README.md added _FS_SPILL_BUFFER=MAP 4 years ago
TODO.org update README 4 years ago
test.sh Improved test.sh 2 years ago

README.md

shuffle3-lean - Improved 3 stage byte shuffler

Deterministically and reversably shuffle a file's bytes around.

Shuffling

Shuffle a file in place

$ shuffle3 -s file

Unshuffling

Unshuffle a file in place

$ shuffle3 -u file

Other options

Run with -h for more options.

Improvements from v1

  • ~70-80x speedup from shuffle3 v1.0
  • Huge reduction in syscalls
  • Takes advantage of the kernel's fs cache
  • Can properly handle large files without core dumping
  • Doesn't dump huge amounts of trash onto each stack frame

Performance

hyperfine reports a 700-800% speedup over v1. It's easy to see why.

V1 flamegraph

V1 uses a pesudo-array adaptor to perform filesystem reads, seeks, and writes. This causes a massive syscall overhead.

V2 flamegraph

Whereas V2 uses a single mmap().

Memory usage

The memusage graph for v1 shows extremely inefficient stack usage. ( the green is supposed to be a line, not a bar! )

This is due to how the unshuffler buffers RNG results.

v1 naively used VLAs to store this buffer, which can baloon to 8 times the size of the file being unshuffled. It dumps this massive buffer onto the stack frame of a function that is called multiple times, causing massive and inefficient stack usage.

This can cause a segfault when attempting to unshuffle a large file, while shuffling a file of the same size might succeed.

V2 improvement

The memusage graph for v2 is a lot more sane.

v2 instead allocates this buffer on the heap. Note the stable stack and heap usage.

Building

Run make to build the normal binary. It will output to shuffle3-release.

Release target

The release (default) target uses the variables RELEASE_CFLAGS, RELEASE_CXXFLAGS and RELEASE_LDFLAGS to specify opitimisations, as well as the OPT_FLAGS variable. These can be set by you if you wish.

Note

The default OPT_FLAGS contains the flag -march=native. This may be underisable for you, in which case set the variable or modify the makefile to remove it.

Debug target

To build with debug information, run make debug. Extra debug flags can be provided with the DEBUG_CFLAGS, DEBUG_CXXFLAGS and DEBUG_LDFLAGS variables which have default values in the Makefile.

The build and unstripped binary will be shuffle3-debug.

PGO target

To build with Profile Guided Optimisation run make pgo, the stripped and optimised binary will be output to shuffle3-pgo.

Notes

Before switching between release and debug targets, remember to run make clean. To disable stripping of release build binaries, run with make STRIP=: release

Compile-time flags

There are some build-time flags you can switch while building by appending to the FEATURE_FLAGS variable.

Flag Description
DEBUG Pretend we're building a debug release even though we're not.
_FS_SPILL_BUFFER Spill buffers into a file if they grow over a threshold. Can cause massive slowdowns but prevent OOMs while unshuffling on systems with low available memory. See shuffle3.h for more details
_FS_SPILL_BUFFER=DYN Same as above except allocates memory dynamically. Might be faster.
_FS_SPILL_BUFFER=MAP Same as above except it calls fallocate() and mmap() to prodive a buffer of the full size needed. Is usually the fastest of the options for _FS_SPILL_BUFFER and is preferrable if possible.

Gentoo ebuild

There is a gentoo ebuild for this project in the overlay test-overlay. direct link

License

GPL'd with <3