//TODO: In explicitly threaded version, we can use an atomic (atomic.h) 64-bit integer `store()` to output those 8 bytes in `out`, which will be re-cast as `uint64_t out[restrict 1]`
//TODO: In explicitly threaded version, we can use an atomic (atomic.h) 64-bit integer `store()` to output those 8 bytes in `out`, which will be re-cast as `uint64_t out[restrict 1]`