Compare commits

...

33 Commits

Author SHA1 Message Date
Avril f4f7aafa32
Feature-gated `-exec/{}` as "exec". Added feature `mode-flags` to enable/disable collect-noallocate collect-preallocate memfile-total-jemalloc.png features that involve runtime flags.
2 years ago
Avril 5c673ae3c4
-exec/{} feature is ready to merge
2 years ago
Avril 9aba1f43a1
Fixed `-exec` to use real `dup()`"d file instead of copying to a new pipe
2 years ago
Avril 31cfee9989
Fixed -exec{}: dup()"d file descriptor was being closed before the process could access it. Fixed -exec (hack ver.): Ditto.
2 years ago
Avril 582bfc0dad
Made -exec (stdin ver) work via piping hack (XXX: We need to find how to pass file directly.) (TODO: -exec{} still doesn"t work: `No such file or directory` error when accessing /dev/fd/{fd} *and* /proc/{pid}/{fd}?? Idk why...
2 years ago
Avril bc121420b8
Chenged return type of `spawn_from_sync()` to be an iterator of `Result<Option<i32>>` so the caller can decide what to do if the child terminates without an exit code. Also changed to `eyre::Result<>` to report the specific failure and child process" index.
2 years ago
Avril 79721444ba
Implemented `-exec/{}` implementation functionality.
2 years ago
Avril 715fa4d5a8
Added simple fatal error message verbosity levels controllable by environment variable at runtime (`RUST_VERBOSE =~ /1|v|verbose/i`), and compile-time (`NO_RT_ERROR_CTL`, `DEFAULT_ERROR =~ /1|v|verbose/i`.) Default is simple error messages at compile and runtime.
2 years ago
Avril f918d5f6e1
Prepared or replacing return type from directly normal operation `Options` to `enum Mode`, which will facilitate returning a `--help` or other special mode case. We can use `impl From<Options> for Mode` and work with this to make the migration of return types easier.
2 years ago
Avril 227abc0d7d
args: Added visitor-pattern argument parsing. Fixed bug in `try_parse_for!()` which `continue`d on the wrong branch. Improved specific and general error messages regarding arguments. Parsing for `-exec/{}` works! A correct `Options` struct is produced, edge-and error- cases are handled correctly with informative messages (TODO: Implement `--help`.)
2 years ago
Avril 177bf3c4ff
main: Added `parse_args()`: Parses args, converts error to eyre::Report, and adds section for suggestion (TODO: Implement `--help`) and section on which args existed.
2 years ago
Avril 1e9224d53c
args: Added `parse_args()`: Attempt to parse the program args into `Options` struct. Added `ArgParseError`, error describing why parsing arguments failed. Added `EXEC_MODE_STRING_TERMINATOR` the terminator used when deciding when to stop parsing an arglist for `-exec/{}`. Added `program_name()`, returns the current executable"s name as a UTF8 string, if the executable"s name was not a UTF8 string, then it is lossily converted to one and invalid characters are replaced.
2 years ago
Avril 0b84adc84f
Added fmt::Display impl for ExecMode that quotes/escapes command or args if needed uses the `POSITIONAL_ARG_STRING` for positional arguments.
2 years ago
Avril 682cd8ec15
args::Option::ExecMode Started argument parsed data structure for `-ecec <command> [<args>...]` (Stdin) `-exec{} <command> [<args>|{}...]` (Positional `"/prof/self/fds/{}" (3++)`).
2 years ago
Avril 6220233d97
Preparation for adding the command-line option `-exec` and `-exec{}`.
2 years ago
Avril 14f32d6262
Explicitly closes `stdout` before process exits.
3 years ago
Avril bc2357c6b6
Attempted to add new feature flag: `memfile-size-output`: Which will pre- or post- set the size of `stdout` to the correct buffer size, allowing consumers of `collect`"s stdout pipe to know the size just like a file. However, `ftruncate()` always fails on stdout. Before re-enabling the feature, we must find out how to set the size of a pipe, if/when you even can, and what syscall(s) you need to do it with.
3 years ago
Avril b2bf26f245
Bumped minor version.
3 years ago
Avril ab56c93532
Bumped minor version.
3 years ago
Avril 395799587b
Removed `stackalloc` dependancy.
3 years ago
Avril 289db974cd
TODO: Find out how to set the length of stdout if possible, so that the consumer of `collect` in a pipeline need only use one syscall splice() to read all the collected data at once, like a file.
3 years ago
Avril b47a27c60f
(partial-merge from (probably) now-defunct branch `hugetlb`.)
3 years ago
Avril 35c6cabce3
memfile::hp: Changed `get_masks()` to return an iterator of `SizedMask` instead of just `Mask` to retain the size of the huge-page itself in bytes (for `ftruncate()` calls, and `mmap()` calls.)
3 years ago
Avril dbf2dbffde
Removed `hugetlb` feature flag: It cannot be used for our use-case.
3 years ago
Avril 8ea3a23e27
Using huge-pages for this is folly from the start: For one, we never actually map the fallocate()"d memory. And two... hugetlbfs does not support write()s in any way, splice()s, send_file()s... It only supports read(), and mmap() (its primary use-case...); which isn"t relevant for us.
3 years ago
Avril 872ea74421
It seems creating HUGETLB memory files either just doesn"t work or changes their behaviour so that any (or at least, small arbitrary) writing or fallocate()ing to them fails... Read up on MFD_HUGETLB more then re-do a test like `memfd_create_wrapper()` to find out why... and if it depends on the MAP_HUGE_ mask, and if so, find one that works... (We know when masks are invalid, since the error message is different. The masks collected via `get_masks()` *are* valid for this system, they just prevent the fd from being any way useful.)
3 years ago
Avril bdfd0a6268
memfile::hp: Added `PartialEq` impl for `c_int` (checks `MAP_HUGE` from `.raw()`), and `c_uint` (checks `memfd_create()` useable constant from `.mask()`.)
3 years ago
Avril 573845a667
memfile::hp: Added `get_masks()`: Returns an iterator over all `MAP_HUGE` masks found on system.
3 years ago
Avril 9b4bb475c0
Added extension methods for flattening `eyre::Result<eyre:Result<T>>`s and related constructs.
3 years ago
Avril e7b96af012
memfile::hp: Added test for `Mask`"s `.raw()` (`MAP_HHUGE_` flag generation.)
3 years ago
Avril c4f73ccfa0
memfile::hp: `find_size_bytes()` fixed and tested; should change function to return `eyre::Result<usize>` instead of `Option<usize>` considering how many different failure-paths exist.
3 years ago
Avril 9c18a5b940
memfile::hp: Added `Mask`: Converts bytes into a suitable `MAP_HUGE_` constant via its `.raw()` method, and a suitable flag for `memfd_create()` via its `.mask()` method.
3 years ago
Avril b882f0ae97
Completed `hp::find_size_bytes()`, and added const-generated lookup table for non "k" seperators.
3 years ago

1
.gitignore vendored

@ -1,4 +1,5 @@
/target /target
/perf
*~ *~
# Profiling # Profiling

41
Cargo.lock generated

@ -84,7 +84,7 @@ checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]] [[package]]
name = "collect" name = "collect"
version = "1.0.2" version = "1.1.0"
dependencies = [ dependencies = [
"bitflags", "bitflags",
"bytes", "bytes",
@ -96,7 +96,6 @@ dependencies = [
"libc", "libc",
"memchr", "memchr",
"recolored", "recolored",
"stackalloc",
"tracing", "tracing",
"tracing-error", "tracing-error",
"tracing-subscriber", "tracing-subscriber",
@ -325,30 +324,6 @@ version = "0.1.21"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7ef03e0a2b150c7a90d01faf6254c9c48a41e95fb2a8c2ac1c6f0d2b9aefc342" checksum = "7ef03e0a2b150c7a90d01faf6254c9c48a41e95fb2a8c2ac1c6f0d2b9aefc342"
[[package]]
name = "rustc_version"
version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "138e3e0acb6c9fb258b19b67cb8abd63c00679d2851805ea151465464fe9030a"
dependencies = [
"semver",
]
[[package]]
name = "semver"
version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d7eb9ef2c18661902cc47e535f9bc51b78acd254da71d375c2f6720d9a40403"
dependencies = [
"semver-parser",
]
[[package]]
name = "semver-parser"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "388a1df253eca08550bef6c72392cfe7c30914bf41df5269b68cbd6ff8f570a3"
[[package]] [[package]]
name = "sharded-slab" name = "sharded-slab"
version = "0.1.4" version = "0.1.4"
@ -360,19 +335,9 @@ dependencies = [
[[package]] [[package]]
name = "smallvec" name = "smallvec"
version = "1.8.0" version = "1.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f2dd574626839106c320a323308629dcb1acfc96e32a8cba364ddc61ac23ee83"
[[package]]
name = "stackalloc"
version = "1.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3a5598a6751ba6c0735dbb9d88910ffce156a0bfdbfb7548f915b52990470bc7" checksum = "2fd0db749597d91ff862fd1d55ea87f7855a744a8425a64695b6fca237d1dad1"
dependencies = [
"cc",
"rustc_version",
]
[[package]] [[package]]
name = "syn" name = "syn"

@ -1,6 +1,6 @@
[package] [package]
name = "collect" name = "collect"
version = "1.0.2" version = "1.1.0"
description = "collect all of stdin until it is closed, then output it all to stdout" description = "collect all of stdin until it is closed, then output it all to stdout"
keywords = ["shell", "pipe", "utility", "unix", "linux"] keywords = ["shell", "pipe", "utility", "unix", "linux"]
authors = ["Avril <flanchan@cumallover.me>"] authors = ["Avril <flanchan@cumallover.me>"]
@ -18,13 +18,16 @@ license = "GPL-3.0-or-later"
# #
# # Logging # # Logging
# Tracing can be disabled at compile-time for higher performance by disabling the `logging` feature (see above, but remove `,logging` from the features.) # Tracing can be disabled at compile-time for higher performance by disabling the `logging` feature (see above, but remove `,logging` from the features.)
default = ["mode-memfile", "logging"] default = ["mode-memfile", "mode-flags", "logging"]
## --- Modes --- ## ## --- Modes --- ##
# Enable all flag options
mode-flags = ["exec"]
# Mode: default # Mode: default
# Use physical-memory backed kernel file-descriptors. (see feature `memfile`.) # Use physical-memory backed kernel file-descriptors. (see feature `memfile`.)
mode-memfile = ["memfile-preallocate"] #, "tracing/release_max_level_warn"] mode-memfile = ["memfile"] #, "tracing/release_max_level_warn"]
# Mode: alternative # Mode: alternative
# Use non-physical memory allocated buffers. # Use non-physical memory allocated buffers.
@ -32,6 +35,9 @@ mode-buffered = ["jemalloc", "bytes"]
## --- Individual features --- ## ## --- Individual features --- ##
# Enable `-exec/{}` flag options
exec = []
# Use an in-memory file for storage instead of a byte-buffer. # Use an in-memory file for storage instead of a byte-buffer.
# #
# This can draastically improve performance as it allows for the use of `splice()` and `send_file()` syscalls instead of many `read()` and `write()` ones. # This can draastically improve performance as it allows for the use of `splice()` and `send_file()` syscalls instead of many `read()` and `write()` ones.
@ -41,13 +47,20 @@ mode-buffered = ["jemalloc", "bytes"]
# * Statically sized (the program can infer the size of standard input.) # * Statically sized (the program can infer the size of standard input.)
# * The standard input file/buffer pipe size is large enough to pre-allocate enough splicing space to use up the rest of your physical RAM. # * The standard input file/buffer pipe size is large enough to pre-allocate enough splicing space to use up the rest of your physical RAM.
# (This will very likely not happen unless you're specifically trying to make it happen, however.) # (This will very likely not happen unless you're specifically trying to make it happen, however.)
memfile = ["bitflags", "lazy_static", "stackalloc"] memfile = ["bitflags"]
# `memfile`: When unable to determine the size of the input, preallocate the buffer to a multiple of the system page-size before writing to it. This can save extra `ftruncate()` calls, but will also result in the buffer needing to be truncated to the correct size at the end if the sizes as not matched. # `memfile`: When unable to determine the size of the input, preallocate the buffer to a multiple of the system page-size before writing to it. This can save extra `ftruncate()` calls, but will also result in the buffer needing to be truncated to the correct size at the end if the sizes as not matched.
# #
# *NOTE*: Requires `getpagesz()` to be available in libc. # *NOTE*: Requires `getpagesz()` to be available in libc.
memfile-preallocate = ["memfile"] memfile-preallocate = ["memfile"]
# Set the size of `stdout` when it is known, so consumers can know exactly the size of the input.
# XXX: Currently doesn't work. TODO: Find out how to make `stdout` `ftruncate()`-able; or some other way to set its size.
memfile-size-output = ["memfile"]
# Pre-set the `memfile-size-output`.
# TODO: Maybe make this a seperate feature? See comment about pre-setting in `work::memfd()`...
# memfile-size-output-preset = ["memfile-size-output"]
# Use jemalloc instead of system malloc. # Use jemalloc instead of system malloc.
# #
@ -88,5 +101,5 @@ recolored = { version = "1.9.3", optional = true }
memchr = "2.4.1" memchr = "2.4.1"
lazy_format = "1.10.0" lazy_format = "1.10.0"
bitflags = {version = "1.3.2", optional = true } bitflags = {version = "1.3.2", optional = true }
stackalloc = {version = "1.1.2", optional = true } lazy_static = "1.4.0" #TODO: XXX: Required for dispersed error messages
lazy_static = { version = "1.4.0", optional = true } #smallvec = { version = "1.9.0", features = ["write", "const_generics", "const_new", "may_dangle", "union"] }

@ -0,0 +1,8 @@
From `strace` examinations, far fewer `splice()/send_file()` are used from the consumer of a `collect` in the middle of a pipe.
A reduction of over 5 times. But still not just a single one.
# TODO: single syscall reads from consumers of `collect` in pipelines
Is there a way we can set the size of `stdout` before exiting?
I dunno what `sealing` is, but maybe that can be used? Or, if not, a specific `fcntl()` call? Finding this out will allow consumers of `collect`'s output to use a single `splice()` instead of many, greatly improving its performance in pipelines as its output can be used like an actual file's...

@ -0,0 +1,744 @@
//! For handling arguments.
use super::*;
use std::ffi::{
OsStr,
OsString,
};
use std::{
iter,
fmt, error,
borrow::Cow,
};
use std::any::type_name;
//TODO: When added, the `args` comptime feature will need to enable `lazy_static`.
use ::lazy_static::lazy_static;
/// The string used for positional argument replacements in `-exec{}`.
pub const POSITIONAL_ARG_STRING: &'static str = "{}";
/// The token that terminates adding arguments for `-exec` / `-exec{}`.
///
/// # Usage
/// If the user wants multiple `-exec/{}` parameters, they must be seperated with this token. e.g. `sh$ collect -exec c a b c \; -exec{} c2 d {} e f {} g`
///
/// It is not required for the user to provide the terminator when the `-exec/{}` is the final argument passed, but they can if they wish. e.g. `sh$ collect -exec command a b c` is valid, and `sh$ collect -exec command a b c \;` is *also* valid.
pub const EXEC_MODE_STRING_TERMINATOR: &'static str = ";";
/// Mode for `-exec` / `-exec{}`
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum ExecMode
{
Stdin{command: OsString, args: Vec<OsString>},
Positional{command: OsString, args: Vec<Option<OsString>>},
}
impl fmt::Display for ExecMode
{
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
#[inline]
fn quote_into<'a, const QUOTE: u8>(string: &'a [u8], f: &mut (impl fmt::Write + ?Sized)) -> fmt::Result
{
let data = if let Some(mut location) = memchr::memchr(QUOTE, string) {
let mut data = Vec::with_capacity(string.len() * 2);
Cow::Owned(loop {
data.extend_from_slice(&string[..location]);
data.extend([b'\\', QUOTE]);
location += match memchr::memchr(QUOTE, &string[location..]) {
Some(x) if !&string[(location + x)..].is_empty() => x,
_ => break data,
};
})
} else {
Cow::Borrowed(string)
};
let string = String::from_utf8_lossy(data.as_ref());
if string.split_whitespace().take(2).count() == 1
{
f.write_char(QUOTE as char)?;
f.write_str(string.as_ref())?;
f.write_char(QUOTE as char)
} else {
f.write_str(string.as_ref())
}
}
match self {
Self::Stdin { command, args } => {
quote_into::<b'\''>(command.as_bytes(), f)?;
args.iter().map(move |arg| {
use fmt::Write;
f.write_char(' ').and_then(|_| quote_into::<b'"'>(arg.as_bytes(), f))
}).collect()
},
Self::Positional { command, args } => {
quote_into::<b'\''>(command.as_bytes(), f)?;
args.iter().map(move |arg| {
use fmt::Write;
f.write_char(' ').and_then(|_| match arg.as_ref() {
Some(arg) => quote_into::<b'"'>(arg.as_bytes(), f),
None => f.write_str(POSITIONAL_ARG_STRING),
})
}).collect()
},
}
}
}
impl ExecMode {
#[inline(always)]
pub fn is_positional(&self) -> bool
{
if let Self::Positional { .. } = &self {
true
} else {
false
}
}
#[inline(always)]
pub fn is_stdin(&self) -> bool
{
!self.is_positional()
}
#[inline(always)]
pub fn command(&self) -> &OsStr
{
match self {
Self::Positional { command, .. } |
Self::Stdin { command, .. } =>
command.as_os_str()
}
}
/// Returns an iterator over the arguments.
///
/// Its output type is `Option<&OsStr>`, because the variant may be `Positional`. If it is instead `Stdin`, all values yielded will be `Some()`.
#[inline]
pub fn arguments(&self) -> impl Iterator<Item = Option<&'_ OsStr>>
{
#[derive(Debug, Clone)]
struct ArgIter<'a>(Result<std::slice::Iter<'a, Option<OsString>>, std::slice::Iter<'a, OsString>>);
impl<'a> Iterator for ArgIter<'a>
{
type Item = Option<&'a OsStr>;
#[inline(always)]
fn next(&mut self) -> Option<Self::Item>
{
Some(match &mut self.0 {
Err(n) => Some(n.next()?.as_os_str()),
Ok(n) => n.next().map(|x| x.as_ref().map(|x| x.as_os_str()))?
})
}
#[inline(always)]
fn size_hint(&self) -> (usize, Option<usize>) {
match &self.0 {
Err(n) => n.size_hint(),
Ok(n) => n.size_hint()
}
}
}
impl<'a> ExactSizeIterator for ArgIter<'a>{}
impl<'a> iter::FusedIterator for ArgIter<'a>{}
ArgIter(match self {
Self::Positional { args, .. } => Ok(args.iter()),
Self::Stdin { args, .. } => Err(args.iter())
})
}
/// Returns a tuple of `(command, args)`.
///
/// # Modes
/// * When invariant is `Stdin`, `positional` is ignored and can be `iter::empty()` or an empty array. If it is not, it is still ignored.
/// * When invariant is `Positional`, `positional` is iterated on for every instance a positional argument should appear.
/// If the iterator completes and there are positional arguments left, they are removed from the iterator's output, and the next argument is shifted along. `iter::repeat(arg)` can be used to insert the same argument into each instance where a positional argument is expected.
#[inline]
pub fn into_process_info<T, I>(self, positional: I) -> (OsString, ExecModeArgIterator<I>)
where I: IntoIterator<Item=OsString>,
{
match self {
Self::Stdin { command, args } => (command, ExecModeArgIterator::Stdin(args.into_iter())),
Self::Positional { command, args } => (command,
ExecModeArgIterator::Positional(ArgZippingIter(args.into_iter(),
positional.into_iter().fuse()))),
}
}
/// # Panics
/// If the invariant of the enum was `Positional`.
#[inline]
pub fn into_process_info_stdin(self) -> (OsString, ExecModeArgIterator<NoPositionalArgs>)
{
#[cold]
#[inline(never)]
fn _panic_invalid_invariant() -> !
{
panic!("Invalid invariant for ExecMode: Expected `Stdin`, was `Positional`.")
}
match self {
Self::Stdin { command, args } => (command, ExecModeArgIterator::Stdin(args.into_iter())),
_ => _panic_invalid_invariant()
}
}
}
pub struct ArgZippingIter<T>(std::vec::IntoIter<Option<OsString>>, iter::Fuse<T::IntoIter>)
where T: IntoIterator<Item = OsString>;
/// Private trait used to mark an instantiation of `ExecModeArgIterator<T>` as not ever being the `Positional` invariant.
unsafe trait NoPositional{}
pub enum NoPositionalArgs{}
impl Iterator for NoPositionalArgs
{
type Item = OsString;
fn next(&mut self) -> Option<Self::Item>
{
match *self{}
}
fn size_hint(&self) -> (usize, Option<usize>) {
(0, Some(0))
}
}
unsafe impl NoPositional for NoPositionalArgs{}
unsafe impl NoPositional for std::convert::Infallible{}
impl ExactSizeIterator for NoPositionalArgs{}
impl DoubleEndedIterator for NoPositionalArgs
{
fn next_back(&mut self) -> Option<Self::Item> {
match *self{}
}
}
impl iter::FusedIterator for NoPositionalArgs{}
impl From<std::convert::Infallible> for NoPositionalArgs
{
fn from(from: std::convert::Infallible) -> Self
{
match from{}
}
}
pub enum ExecModeArgIterator<T: IntoIterator<Item = OsString>> {
Stdin(std::vec::IntoIter<OsString>),
Positional(ArgZippingIter<T>),
}
impl<I> Iterator for ExecModeArgIterator<I>
where I: IntoIterator<Item = OsString>
{
type Item = OsString;
#[inline]
fn next(&mut self) -> Option<Self::Item>
{
loop {
break match self {
Self::Stdin(vec) => vec.next(),
Self::Positional(ArgZippingIter(ref mut vec, ref mut pos)) => {
match vec.next()? {
None => {
match pos.next() {
None => continue,
replace => replace,
}
},
set => set,
}
},
}
}
}
#[inline(always)]
fn size_hint(&self) -> (usize, Option<usize>) {
match self {
Self::Stdin(vec, ..) => vec.size_hint(),
Self::Positional(ArgZippingIter(vec, ..)) => vec.size_hint(),
}
}
}
impl<I> iter::FusedIterator for ExecModeArgIterator<I>
where I: IntoIterator<Item = OsString>{}
// ExecModeArgIterator can never be FixedSizeIterator if it is *ever* `Positional`
impl<I: NoPositional> ExactSizeIterator for ExecModeArgIterator<I>
where I: IntoIterator<Item = OsString>{}
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, Default)]
pub struct Options {
/// For `-exec` (stdin exec) and `-ecec{}` (positional exec)
exec: Vec<ExecMode>,
}
impl Options
{
#[inline(always)]
fn count_exec(&self) -> (usize, usize)
{
self.exec.is_empty().then(|| (0, 0))
.or_else(move ||
self.exec.iter().map(|x| {
x.is_positional().then(|| (0, 1)).unwrap_or((1, 0))
})
.reduce(|(s, p), (s1, p1)| (s + s1, p + p1)))
.unwrap_or((0,0))
}
/// Has `-exec` (stdin) or `-exec{}` (positional)
///
/// Tuple element 1 is for `-exec`; element 2 is for `-exec{}`.
#[inline(always)]
pub fn has_exec(&self) -> (bool, bool)
{
self.exec.is_empty().then(|| (false, false))
.or_else(move ||
self.exec.iter().map(|x| {
let x = x.is_positional();
(!x, x)
})
.reduce(|(s, p), (s1, p1)| (s || s1, p || p1)))
.unwrap_or((false, false))
}
#[inline]
pub fn has_positional_exec(&self) -> bool
{
self.has_exec().1
}
#[inline]
pub fn has_stdin_exec(&self) -> bool
{
self.has_exec().0
}
#[inline]
pub fn opt_exec(&self) -> impl Iterator<Item= &'_ ExecMode> + ExactSizeIterator + iter::FusedIterator + DoubleEndedIterator
{
self.exec.iter()
}
#[inline]
pub fn into_opt_exec(self) -> impl Iterator<Item=ExecMode> + ExactSizeIterator + iter::FusedIterator
{
self.exec.into_iter()
}
}
/// The executable name of this program.
///
/// # Returns
/// * If the program's executable name is a valid UTF8 string, that string.
/// * If it is not, then that string is lossily-converted to a UTF8 string, with invalid characters replaced accordingly. This can be checked by checking if the return value is `Cow::Owned`, if it is, then this is not a reliable indication of the exetuable path's basename.
/// * If there is no program name provided, i.e. if `argc == 0`, then an empty string is returned.
#[inline(always)]
pub fn program_name() -> Cow<'static, str>
{
lazy_static! {
static ref NAME: OsString = std::env::args_os().next().unwrap_or(OsString::from_vec(Vec::new()));
}
String::from_utf8_lossy(NAME.as_bytes())
}
/// Parse the program's arguments into an `Options` array.
/// If parsing fails, an `ArgParseError` is returned detailing why it failed.
#[inline]
#[cfg_attr(feature="logging", instrument(err(Debug)))]
pub fn parse_args() -> Result<Options, ArgParseError>
{
let iter = std::env::args_os();
if_trace!(trace!("argc == {}, argv == {iter:?}", iter.len()));
parse_from(iter.skip(1))
}
#[inline(always)]
pub fn type_name_short<T: ?Sized>() -> &'static str
{
let mut s = std::any::type_name::<T>();
if let Some(idx) = memchr::memrchr(b':', s.as_bytes()) {
s = &s[idx.saturating_sub(1)..];
if s.len() >= 2 && &s[..2] == "::" {
s = &s[2..];
}
}
if s.len() > 0 && (s.as_bytes()[s.len()-1] == b'>' && s.as_bytes()[0] != b'<') {
s = &s[..(s.len()-1)];
}
s
}
#[cfg_attr(feature="logging", instrument(level="debug", skip_all, fields(args = ?type_name_short::<I>())))]
fn parse_from<I, T>(args: I) -> Result<Options, ArgParseError>
where I: IntoIterator<Item = T>,
T: Into<OsString>
{
let mut args = args.into_iter().map(Into::into);
let mut output = Options::default();
let mut idx = 0;
//XXX: When `-exec{}` is provided, but no `{}` arguments are found, maybe issue a warning with `if_trace!(warning!())`? There are valid situations to do this in, but they are rare...
let mut parser = || -> Result<_, ArgParseError> {
while let Some(mut arg) = args.next() {
idx += 1;
macro_rules! try_parse_for {
(@ assert_parser_okay $parser:path) => {
const _:() = {
const fn _assert_is_parser<P: TryParse + ?Sized>() {}
const fn _assert_is_result<P: TryParse + ?Sized>(res: P::Output) -> P::Output { res }
_assert_is_parser::<$parser>();
};
};
($parser:path => $then:expr) => {
{
try_parse_for!(@ assert_parser_okay $parser);
//_assert_is_closure(&$then); //XXX: There isn't a good way to tell without having to specify some bound on return type...
if let Some(result) = parsers::try_parse_with::<$parser>(&mut arg, &mut args) {
// Result succeeded on visitation, use this parser for this argument and then continue to the next one.
$then(result?);
continue;
}
// Result failed on visitation, so continue the control flow to the next `try_parse_for!()` visitation attempt.
}
};
/*($parser:path => $then:expr) => {
$then(try_parse_for!(try $parser => std::convert::identity)?)
}*/
}
//TODO: Add `impl TryParse` struct for `--help` and add it at the *top* of the visitation stack (it will most likely appear there.)
// This may require a re-work of the `Options` struct, or an enum wrapper around it should be returned instead of options directly, for special modes (like `--help` is, etc.) Perhaps `pub enum Mode { Normal(Options), Help, }` or something should be returned, and `impl From<Options>` for it, with the caller of this closure (below)
try_parse_for!(parsers::ExecMode => |result| output.exec.push(result));
//Note: try_parse_for!(parsers::SomeOtherOption => |result| output.some_other_option.set(result.something)), etc, for any newly added arguments.
if_trace!(debug!("reached end of parser visitation for argument #{idx} {arg:?}! Failing now with `UnknownOption`"));
return Err(ArgParseError::UnknownOption(arg));
}
Ok(())
};
parser()
.with_index(idx)
.map(move |_| output.into()) //XXX: This is `output.into()`, because when successful result return type is changed from directly `Options` to `enum Mode` (which will `impl From<Options>`), it will allow any `impl Into<Mode>` to be returned. (Boxed dynamic dispatch with a trait `impl FromMode<T: ?Sized> (for Mode) { fn from(val: Box<T>) -> Self { IntoMode::into(val) } }, auto impl trait IntoMode { fn into(self: Box<Self>) -> Mode }` may be required if different types are returned from the closure, this is okay, as argument parsed struct can get rather large.)
}
#[derive(Debug)]
pub enum ArgParseError
{
/// With an added argument index.
WithIndex(usize, Box<ArgParseError>),
/// Returned when an invalid or unknown argument is found
UnknownOption(OsString),
/// Returned when the argument, `argument`, is passed in an invalid context by the user.
InvalidUsage { argument: String, message: String, inner: Option<Box<dyn error::Error + Send + Sync + 'static>> },
//VisitationFailed,
}
trait ArgParseErrorExt<T>: Sized
{
fn with_index(self, idx: usize) -> Result<T, ArgParseError>;
}
impl ArgParseError
{
#[inline]
pub fn wrap_index(self, idx: usize) -> Self {
Self::WithIndex(idx, Box::new(self))
}
}
impl<T, E: Into<ArgParseError>> ArgParseErrorExt<T> for Result<T, E>
{
#[inline(always)]
fn with_index(self, idx: usize) -> Result<T, ArgParseError> {
self.map_err(Into::into)
.map_err(move |e| e.wrap_index(idx))
}
}
impl error::Error for ArgParseError
{
#[inline]
fn source(&self) -> Option<&(dyn error::Error + 'static)> {
match self {
Self::InvalidUsage { inner, .. } => inner.as_ref().map(|x| -> &(dyn error::Error + 'static) { x.as_ref() }),
Self::WithIndex(_, inner) => inner.source(),
_ => None,
}
}
}
impl fmt::Display for ArgParseError
{
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
match self {
Self::WithIndex(index, inner) => write!(f, "Argument #{index}: {inner}"),
Self::UnknownOption(opt) => {
f.write_str("Invalid/unknown argument: `")?;
f.write_str(String::from_utf8_lossy(opt.as_bytes()).as_ref())?;
f.write_str("`")
},
Self::InvalidUsage { argument, message, .. } => write!(f, "Invalid usage for argument `{argument}`: {message}")
}
}
}
trait ArgError: error::Error + Send + Sync + 'static
{
fn into_invalid_usage(self) -> (String, String, Box<dyn error::Error + Send + Sync + 'static>)
where Self: Sized;
}
trait TryParse: Sized
{
type Error: ArgError;
type Output;
#[inline(always)]
fn visit(argument: &OsStr) -> Option<Self> { let _ = argument; None }
fn parse<I: ?Sized>(self, argument: OsString, rest: &mut I) -> Result<Self::Output, Self::Error>
where I: Iterator<Item = OsString>;
}
impl<E: error::Error + Send + Sync + 'static> From<(String, String, E)> for ArgParseError
{
#[inline]
fn from((argument, message, inner): (String, String, E)) -> Self
{
Self::InvalidUsage { argument, message, inner: Some(Box::new(inner)) }
}
}
impl<E: ArgError> From<E> for ArgParseError
{
#[inline(always)]
fn from(from: E) -> Self
{
let (argument, message, inner) = from.into_invalid_usage();
Self::InvalidUsage { argument, message, inner: Some(inner) }
}
}
#[inline(always)]
fn extract_last_pathspec<'a>(s: &'a str) -> &'a str
{
//#[cfg_attr(feature="logging", feature(instrument(ret)))]
#[allow(dead_code)]
fn string_diff<'a>(a: &'a str, b: &'a str) -> usize
{
#[cold]
#[inline(never)]
fn _panic_non_inclusive(swap: bool) -> !
{
let a = swap.then(|| "b").unwrap_or("a");
let b = swap.then(|| "a").unwrap_or("b");
panic!("String {a} was not inside string {b}")
}
let a_addr = a.as_ptr() as usize;
let b_addr = b.as_ptr() as usize;
let (a_addr, b_addr, sw) =
if !(a_addr + a.len() > b_addr + b.len() && b_addr + b.len() < a_addr + a.len()) {
(b_addr, a_addr, true)
} else {
(a_addr, a_addr, false)
};
if b_addr < a_addr /*XXX || (b_addr + b.len()) > (a_addr + a.len())*/ {
_panic_non_inclusive(sw)
}
return a_addr.abs_diff(b_addr);
}
s.rsplit_once("::")
.map(|(_a, b)| /*XXX: This doesn't work...match _a.rsplit_once("::") {
Some((_, last)) => &s[string_diff(s, last)..],
_ => b
}*/ b)
.unwrap_or(s)
}
mod parsers {
use super::*;
#[inline(always)]
#[cfg_attr(feature="logging", instrument(level="debug", skip(rest), fields(parser = %extract_last_pathspec(type_name::<P>()))))]
pub(super) fn try_parse_with<P>(arg: &mut OsString, rest: &mut impl Iterator<Item = OsString>) -> Option<Result<P::Output, ArgParseError>>
where P: TryParse
{
#[cfg(feature="logging")]
let _span = tracing::warn_span!("parse", parser= %extract_last_pathspec(type_name::<P>()), ?arg);
P::visit(arg.as_os_str()).map(move |parser| {
#[cfg(feature="logging")]
let _in = _span.enter();
parser.parse(/*if_trace!{true arg.clone(); */std::mem::replace(arg, OsString::default()) /*}*/, rest).map_err(Into::into) //This clone is not needed, the argument is captured by `try_parse_with` (in debug) and `parse` (in warning) already.
}).map(|res| {
#[cfg(feature="logging")]
match res.as_ref() {
Err(err) => {
::tracing::event!(::tracing::Level::ERROR, ?err, "Attempted parse failed with error")
},
_ => ()
}
res
}).or_else(|| {
#[cfg(feature="logging")]
::tracing::event!(::tracing::Level::TRACE, "no match for this parser with this arg, continuing visitation.");
None
})
}
/// Parser for `ExecMode`
///
/// Parses `-exec` / `-exec{}` modes.
#[derive(Debug, Clone, Copy)]
pub enum ExecMode {
Stdin,
Postional,
}
impl ExecMode {
#[inline(always)]
fn is_positional(&self) -> bool
{
match self {
Self::Postional => true,
_ => false
}
}
#[inline(always)]
fn command_string(&self) -> &'static str
{
if self.is_positional() {
"-exec{}"
} else {
"-exec"
}
}
}
#[derive(Debug)]
pub struct ExecModeParseError(ExecMode);
impl error::Error for ExecModeParseError{}
impl fmt::Display for ExecModeParseError
{
#[inline(always)]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
write!(f, "{} needs at least a command", self.0.command_string())
}
}
impl ArgError for ExecModeParseError
{
fn into_invalid_usage(self) -> (String, String, Box<dyn error::Error + Send + Sync + 'static>)
where Self: Sized {
(self.0.command_string().to_owned(), "Expected a command file-path to execute.".to_owned(), Box::new(self))
}
}
impl TryParse for ExecMode
{
type Error = ExecModeParseError;
type Output = super::ExecMode;
#[inline(always)]
fn visit(argument: &OsStr) -> Option<Self> {
if argument == OsStr::from_bytes(b"-exec") {
Some(Self::Stdin)
} else if argument == OsStr::from_bytes(b"-exec{}") {
Some(Self::Postional)
} else {
None
}
}
#[inline]
fn parse<I: ?Sized>(self, _argument: OsString, rest: &mut I) -> Result<Self::Output, Self::Error>
where I: Iterator<Item = OsString> {
mod warnings {
use super::*;
/// Issue a warning when `-exec{}` is provided as an argument, but no positional arguments (`{}`) are specified in the argument list to the command.
#[cold]
#[cfg_attr(feature="logging", inline(never))]
#[cfg_attr(not(feature="logging"), inline(always))]
pub fn execp_no_positional_replacements()
{
if_trace!(warn!("-exec{{}} provided with no positional arguments `{}`, there will be no replacement with the data. Did you mean `-exec`?", POSITIONAL_ARG_STRING));
}
/// Issue a warning if the user apparently meant to specify two `-exec/{}` arguments to `collect`, but seemingly is accidentally is passing the `-exec/{}` string as an argument to the first.
#[cold]
#[cfg_attr(feature="logging", inline(never))]
#[cfg_attr(not(feature="logging"), inline(always))]
pub fn exec_apparent_missing_terminator(first_is_positional: bool, second_is_positional: bool, command: &OsStr, argument_number: usize)
{
if_trace! {
warn!("{} provided, but argument to command {command:?} number #{argument_number} is `{}`. Are you missing the terminator '{}' before this argument?", if first_is_positional {"-exec{}"} else {"-exec"}, if second_is_positional {"-exec{}"} else {"-exec"}, EXEC_MODE_STRING_TERMINATOR)
}
}
/// Issue a warning if the user apparently missed a command to `-exec{}`, and has typed `-exec{} {}`...
#[cold]
#[cfg_attr(feature="logging", inline(never))]
#[cfg_attr(not(feature="logging"), inline(always))]
//TODO: Do we make this a feature in the future? Being able to `fexecve()` the input received?
pub fn execp_command_not_substituted()
{
if_trace! {
warn!("-exec{{}} provided with a command as the positional replacement string `{}`. Commands are not substituted. Are you missing a command? (Note: Currently, `fexecve()`ing the input is not supported.)", POSITIONAL_ARG_STRING)
}
}
/// Issue a warning if the user apparently attempted to terminate a `-exec/{}` argument, but instead made the command of the `-exec/{}` itself the terminator.
#[cold]
#[cfg_attr(feature="logging", inline(never))]
#[cfg_attr(not(feature="logging"), inline(always))]
pub fn exec_terminator_as_command(exec_arg_str: &str)
{
if_trace! {
warn!("{exec_arg_str} provided with a command that is the -exec/-exec{{}} terminator character `{}`. The sequence is not terminated, and instead the terminator character itself is taken as the command to execute. Did you miss a command before the terminator?", EXEC_MODE_STRING_TERMINATOR)
}
}
}
let command = rest.next().ok_or_else(|| ExecModeParseError(self))?;
if command == EXEC_MODE_STRING_TERMINATOR {
warnings::exec_terminator_as_command(self.command_string());
}
let test_warn_missing_term = |(idx , string) : (usize, OsString)| {
if let Some(val) = Self::visit(&string) {
warnings::exec_apparent_missing_terminator(self.is_positional(), val.is_positional(), &command, idx + 1);
}
string
};
Ok(match self {
Self::Stdin => {
super::ExecMode::Stdin {
args: rest
.take_while(|argument| argument.as_bytes() != EXEC_MODE_STRING_TERMINATOR.as_bytes())
.enumerate().map(&test_warn_missing_term)
.collect(),
command,
}
},
Self::Postional => {
let mut repl_warn = true;
if command == POSITIONAL_ARG_STRING {
warnings::execp_command_not_substituted();
}
let res = super::ExecMode::Positional {
args: rest
.take_while(|argument| argument.as_bytes() != EXEC_MODE_STRING_TERMINATOR.as_bytes())
.enumerate().map(&test_warn_missing_term)
.map(|x| if x.as_bytes() == POSITIONAL_ARG_STRING.as_bytes() {
repl_warn = false;
None
} else {
Some(x)
})
.collect(),
command,
};
if repl_warn { warnings::execp_no_positional_replacements(); }
res
},
})
}
}
}

@ -0,0 +1,207 @@
//! Errors and helpers for errors.
//TODO: Comment how this works (controllably optional simple or complex `main()` error messages.)
use super::*;
use std::{
fmt,
error,
};
use std::os::unix::prelude::*;
pub const DEFAULT_USE_ENV: bool = std::option_env!("NO_RT_ERROR_CTL").is_none();
pub type DispersedResult<T, const USE_ENV: bool = DEFAULT_USE_ENV> = Result<T, Dispersed<USE_ENV>>;
pub const ENV_NAME: &'static str = "RUST_VERBOSE";
lazy_static!{
static ref DEFAULT_ENV_VERBOSE: DispersedVerbosity = match std::option_env!("DEFAULT_ERROR") {
Some("1") |
Some("V") |
Some("verbose") |
Some("VERBOSE") |
Some("v") => DispersedVerbosity::Verbose,
Some("0") |
_ => DispersedVerbosity::static_default(),
};
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, Copy)]
#[repr(u8)]
pub enum DispersedVerbosity
{
Simple = 0,
Verbose = 1,
}
impl From<DispersedVerbosity> for bool
{
#[inline]
fn from(from: DispersedVerbosity) -> Self
{
from.is_verbose()
}
}
impl Default for DispersedVerbosity
{
#[inline]
fn default() -> Self
{
*DEFAULT_ENV_VERBOSE
}
}
impl DispersedVerbosity
{
#[inline(always)]
const fn static_default() -> Self
{
Self::Simple
}
#[inline(always)]
pub const fn is_verbose(self) -> bool
{
(self as u8) != 0
}
}
fn get_env_value() -> DispersedVerbosity
{
match std::env::var_os(ENV_NAME) {
Some(mut value) => {
value.make_ascii_lowercase();
match value.as_bytes() {
b"1" |
b"v" |
b"verbose" => DispersedVerbosity::Verbose,
b"0" |
b"s" |
b"simple" => DispersedVerbosity::Simple,
_ => DispersedVerbosity::default(),
}
},
None => Default::default(),
}
}
#[inline]
pub fn dispersed_env_verbosity() -> DispersedVerbosity
{
lazy_static! {
static ref VALUE: DispersedVerbosity = get_env_value();
}
*VALUE
}
/// A simpler error message when returning an `eyre::Report` from main.
pub struct Dispersed<const USE_ENV: bool = DEFAULT_USE_ENV>(eyre::Report);
impl<const E: bool> From<eyre::Report> for Dispersed<E>
{
#[inline]
fn from(from: eyre::Report) -> Self
{
Self(from)
}
}
impl<const E: bool> Dispersed<E>
{
#[inline]
pub fn into_inner(self) -> eyre::Report
{
self.0
}
}
impl Dispersed<false>
{
#[inline(always)]
pub fn obey_env(self) -> Dispersed<true>
{
Dispersed(self.0)
}
}
impl Dispersed<true>
{
#[inline(always)]
pub fn ignore_env(self) -> Dispersed<false>
{
Dispersed(self.0)
}
}
impl<const E: bool> Dispersed<E>
{
#[inline(always)]
pub fn set_env<const To: bool>(self) -> Dispersed<To>
{
Dispersed(self.0)
}
}
impl error::Error for Dispersed<true>
{
#[inline]
fn source(&self) -> Option<&(dyn error::Error + 'static)> {
self.0.source()
}
}
impl error::Error for Dispersed<false>
{
#[inline]
fn source(&self) -> Option<&(dyn error::Error + 'static)> {
self.0.source()
}
}
impl fmt::Debug for Dispersed<false>
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
fmt::Display::fmt(&self.0, f)
}
}
impl fmt::Display for Dispersed<false>
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
fmt::Debug::fmt(&self.0, f)
}
}
impl fmt::Debug for Dispersed<true>
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
if dispersed_env_verbosity().is_verbose() {
fmt::Debug::fmt(&self.0, f)
} else {
fmt::Display::fmt(&self.0, f)
}
}
}
impl fmt::Display for Dispersed<true>
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
if dispersed_env_verbosity().is_verbose() {
fmt::Display::fmt(&self.0, f)
} else {
fmt::Debug::fmt(&self.0, f)
}
}
}

@ -0,0 +1,146 @@
//! Used for implementation of `-exec[{}]`
use super::*;
use args::Options;
use std::{
fs,
process,
path::{
Path,
PathBuf,
},
ffi::{
OsStr,
OsString,
}
};
/// Get a path to the file-descriptor refered to by `file`.
#[cfg_attr(feature="logging", instrument(skip_all, fields(fd = ?file.as_raw_fd())))]
fn proc_file<F: ?Sized + AsRawFd>(file: &F) -> PathBuf
{
let fd = file.as_raw_fd();
let pid = process::id();
//process::Command::new("/bin/ls").arg("-l").arg(format!("/proc/{pid}/fd/")).spawn().unwrap().wait().unwrap();
format!("/proc/{pid}/fd/{fd}").into()
//format!("/dev/fd/{fd}").into()
}
/// Attempt to `dup()` a file descriptor into a `RawFile`.
#[inline]
#[cfg_attr(feature="logging", instrument(skip_all, err, fields(fd = ?file.as_raw_fd())))]
fn dup_file<F: ?Sized + AsRawFd>(file: &F) -> io::Result<memfile::RawFile>
{
let fd = file.as_raw_fd();
debug_assert!(fd >= 0, "Bad input file descriptor from {} (value was {fd})", std::any::type_name::<F>());
let fd = unsafe {
let res = libc::dup(fd);
if res < 0 {
return Err(io::Error::last_os_error());
} else {
res
}
};
Ok(memfile::RawFile::take_ownership_of_unchecked(fd))
}
#[cfg_attr(feature="logging", instrument(skip_all, fields(has_stdin = ?file.is_some(), filename = ?filename.as_ref())))]
fn run_stdin<I>(file: Option<impl Into<fs::File>>, filename: impl AsRef<OsStr>, args: I) -> io::Result<(process::Child, Option<fs::File>)>
where I: IntoIterator<Item = OsString>,
{
let file = {
let file: Option<fs::File> = file.map(Into::into);
//TODO: Do we need to fcntl() this to make it (the fd) RW?
match file {
None => None,
Some(mut file) => {
use std::io::Seek;
if let Err(err) = file.seek(io::SeekFrom::Start(0)) {
if_trace!(warn!("Failed to seed to start: {err}"));
}
let _ = try_seal_size(&file);
Some(file)
},
}
};
let child = process::Command::new(filename)
.args(args)
.stdin(file.as_ref().map(|file| process::Stdio::from(fs::File::from(dup_file(file).unwrap()))).unwrap_or_else(|| process::Stdio::null())) //XXX: Maybe change to `piped()` and `io::copy()` from begining (using pread()/send_file()/copy_file_range()?)
.stdout(process::Stdio::inherit())
.stderr(process::Stdio::inherit())
.spawn()?;
//TODO: XXX: Why does `/proc/{pid}/fd/{fd}` **and** `/dev/fd/{fd}` not work for -exec{}, and why foes `Stdio::from(file)` not work for stdin even *afer* re-seeking the file???
/*
if let Some((mut input, mut output)) = file.as_mut().zip(child.stdin.take()) {
io::copy(&mut input, &mut output)
/*.wrap_err("Failed to pipe file into stdin for child")*/?;
}*/
if_trace!(info!("Spawned child process: {}", child.id()));
/*Ok(child.wait()?
.code()
.unwrap_or(-1)) //XXX: What should we do if the process terminates without a code (i.e. by a signal?)
*/
Ok((child, file))
}
/// Run a single `-exec` / `-exec{}` and return the (possibly still running) child process if succeeded in spawning.
///
/// The caller must wait for all child processes to exit before the parent does.
#[inline]
#[cfg_attr(feature="logging", instrument(skip(file), err))]
pub fn run_single<F: ?Sized + AsRawFd>(file: &F, opt: args::ExecMode) -> io::Result<(process::Child, Option<fs::File>)>
{
let input: std::mem::ManuallyDrop<memfile::RawFile> = std::mem::ManuallyDrop::new(dup_file(file)?);
match opt {
args::ExecMode::Positional { command, args } => {
run_stdin(None::<fs::File>, command, args.into_iter().map(|x| x.unwrap_or_else(|| proc_file(&*input).into())))
},
args::ExecMode::Stdin { command, args } => {
run_stdin(Some(std::mem::ManuallyDrop::into_inner(input)), command, args)
}
}
}
/// Spawn all `-exec/{}` commands and return all running children.
///
/// # Returns
/// An iterator of each (possibly running) spawned child, or the error that occoured when trying to spawn that child from the `exec` option in `opt`.
#[cfg_attr(feature="logging", instrument(skip(file)))]
pub fn spawn_from<'a, F: ?Sized + AsRawFd>(file: &'a F, opt: Options) -> impl IntoIterator<Item = io::Result<(process::Child, Option<fs::File>)>> + 'a
{
opt.into_opt_exec().map(|x| run_single(file, x))
}
/// Spawn all `-exec/{}` commands and wait for all children to complete.
///
/// # Returns
/// An iterator of the result of spawning each child and its exit status (if one exists)
///
/// If the child exited via a signal termination, or another method that does not return a status, the iterator's result will be `Ok(None)`
#[inline]
#[cfg_attr(feature="logging", instrument(skip(file)))]
pub fn spawn_from_sync<'a, F: ?Sized + AsRawFd>(file: &'a F, opt: Options) -> impl IntoIterator<Item = eyre::Result<Option<i32>>> + 'a
{
spawn_from(file, opt).into_iter().zip(0..).map(move |(child, idx)| -> eyre::Result<_> {
let idx = move || idx.to_string().header("The child index");
match child {
Ok(mut child) => {
Ok(child.0.wait()
.wrap_err("Failed to wait on child")
.with_note(|| "The child may have detached itself")
.with_section(idx)?
.code())
},
Err(err) => {
if_trace!(error!("Failed to spawn child: {err}"));
Err(err)
.wrap_err("Failed to spawn child")
}
}.with_section(idx)
})
}

@ -0,0 +1,419 @@
//! Extensions
use super::*;
use std::{
mem::{
self,
ManuallyDrop,
},
marker::PhantomData,
ops,
iter,
};
/// Essentially equivelant bound as `eyre::StdError` (private trait)
///
/// Useful for when using traits that convert a generic type into an `eyre::Report`.
pub trait EyreError: std::error::Error + Send + Sync + 'static{}
impl<T: ?Sized> EyreError for T
where T: std::error::Error + Send + Sync + 'static{}
#[derive(Debug, Clone)]
pub struct Joiner<I, F>(I, F, bool);
#[derive(Debug, Clone, Copy)]
pub struct CloneJoiner<T>(T);
impl<I, F> Joiner<I, F>
{
#[inline(always)]
fn size_calc(low: usize) -> usize
{
match low {
0 | 1 => low,
2 => 4,
x if x % 2 == 0 => x * 2,
odd => (odd * 2) - 1
}
}
}
type JoinerExt = Joiner<std::convert::Infallible, std::convert::Infallible>;
impl<I, F> Iterator for Joiner<I, F>
where I: Iterator, F: FnMut() -> I::Item
{
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
let val = match self.2 {
false => self.0.next(),
true => Some(self.1())
};
if val.is_some() {
self.2 ^= true;
}
val
}
#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
let (low, high) = self.0.size_hint();
(Self::size_calc(low), high.map(Self::size_calc))
}
}
impl<I, T> Iterator for Joiner<I, CloneJoiner<T>>
where I: Iterator<Item = T>, T: Clone
{
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
let val = match self.2 {
false => self.0.next(),
true => Some(self.1.0.clone())
};
if val.is_some() {
self.2 ^= true;
}
val
}
#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
let (low, high) = self.0.size_hint();
(Self::size_calc(low), high.map(Self::size_calc))
}
}
impl<I, F> iter::FusedIterator for Joiner<I, F>
where Joiner<I,F>: Iterator,
I: iter::FusedIterator{}
impl<I, F> ExactSizeIterator for Joiner<I, F>
where Joiner<I,F>: Iterator,
I: ExactSizeIterator {}
pub trait IterJoinExt<T>: Sized
{
fn join_by<F: FnMut() -> T>(self, joiner: F) -> Joiner<Self, F>;
fn join_by_default(self) -> Joiner<Self, fn () -> T>
where T: Default;
fn join_by_clone(self, value: T) -> Joiner<Self, CloneJoiner<T>>
where T: Clone;
}
impl<I, T> IterJoinExt<T> for I
where I: Iterator<Item = T>
{
#[inline]
fn join_by<F: FnMut() -> T>(self, joiner: F) -> Joiner<Self, F> {
Joiner(self, joiner, false)
}
#[inline]
fn join_by_default(self) -> Joiner<Self, fn () -> T>
where T: Default
{
Joiner(self, T::default, false)
}
#[inline]
fn join_by_clone(self, value: T) -> Joiner<Self, CloneJoiner<T>>
where T: Clone {
Joiner(self, CloneJoiner(value), false)
}
}
pub trait IntoEyre<T>
{
fn into_eyre(self) -> eyre::Result<T>;
}
impl<T, E: EyreError> IntoEyre<T> for Result<T, E>
{
#[inline(always)]
fn into_eyre(self) -> eyre::Result<T> {
match self {
Err(e) => Err(eyre::Report::from(e)),
Ok(y) => Ok(y),
}
}
}
pub trait FlattenReports<T>
{
/// Flatten a `eyre::Result<eyre::Result<T>>` into an `eyre::Result<T>`
fn flatten(self) -> eyre::Result<T>;
}
pub trait FlattenEyreResult<T, E>
where E: EyreError
{
/// Flatten a `Result<Result<T, IE>, OE>` into an `eyre::Result<E>`
fn flatten(self) -> eyre::Result<T>;
}
pub trait FlattenResults<T, E>
{
/// Flatten a `Result<Result<T, IE>, E>` into a `Result<T, E>`.
fn flatten(self) -> Result<T, E>;
}
impl<T, E, IE: Into<E>> FlattenResults<T, E> for Result<Result<T, IE>, E>
{
/// Flatten a `Result<Result<T, impl Into<E>>, E>` into a `Result<T, E>`
///
/// This will convert the inner error into the type of the outer error.
#[inline]
fn flatten(self) -> Result<T, E> {
match self {
Err(e) => Err(e),
Ok(Ok(e)) => Ok(e),
Ok(Err(e)) => Err(e.into())
}
}
}
impl<T, IE: EyreError, E: EyreError> FlattenEyreResult<T, E> for Result<Result<T, IE>, E>
{
#[inline]
fn flatten(self) -> eyre::Result<T> {
match self {
Err(e) => Err(e).with_note(|| "Flattened report (outer)"),
Ok(Err(e)) => Err(e).with_warning(|| "Flattened report (inner)"),
Ok(Ok(a)) => Ok(a),
}
}
}
impl<T> FlattenReports<T> for eyre::Result<eyre::Result<T>>
{
#[inline]
fn flatten(self) -> eyre::Result<T> {
match self {
Err(e) => Err(e.with_note(|| "Flattened report (outer)")),
Ok(Err(e)) => Err(e.with_warning(|| "Flattened report (inner)")),
Ok(Ok(a)) => Ok(a),
}
}
}
impl<T> FlattenReports<T> for eyre::Result<Option<T>>
{
#[inline]
fn flatten(self) -> eyre::Result<T> {
match self {
Err(e) => Err(e.with_note(|| "Flattened report (outer)")),
Ok(None) => Err(eyre!("Value expected, but not found").with_section(|| format!("Option<{}>", std::any::type_name::<T>()).header("Option type was")).with_warning(|| "Flattened report (inner)")),
Ok(Some(a)) => Ok(a),
}
}
}
impl<T, E: EyreError> FlattenEyreResult<T, E> for Result<Option<T>, E>
{
#[inline]
fn flatten(self) -> eyre::Result<T> {
match self {
Err(e) => Err(e).with_note(|| "Flattened report (outer)"),
Ok(None) => Err(eyre!("Value expected, but not found")
.with_section(|| format!("Option<{}>", std::any::type_name::<T>())
.header("Option type was"))
.with_warning(|| "Flattened report (inner)")),
Ok(Some(a)) => Ok(a),
}
}
}
#[derive(Debug)]
enum RunOnceInternal<F>
{
Live(ManuallyDrop<F>),
Dead,
}
impl<F: Clone> Clone for RunOnceInternal<F>
{
#[inline]
fn clone(&self) -> Self {
match self {
Self::Live(l) => Self::Live(l.clone()),
_ => Self::Dead
}
}
}
impl<F> RunOnceInternal<F>
{
/// Take `F` now, unless it doesn't need to be dropped.
///
/// # Returns
/// * if `!needs_drop::<F>()`, `None` is always returned.
/// * if `self` is `Dead`, `None` is returned.
/// * if `self` is `Live(f)`, `Some(f)` is returned, and `self` is set to `Dead`.
#[inline(always)]
fn take_now_for_drop(&mut self) -> Option<F>
{
if mem::needs_drop::<F>() {
self.take_now()
} else {
None
}
}
/// If `Live`, return the value inside and set to `Dead`.
/// Otherwise, return `None`.
#[inline(always)]
fn take_now(&mut self) -> Option<F>
{
if let Self::Live(live) = self {
let val = unsafe {
ManuallyDrop::take(live)
};
*self = Self::Dead;
Some(val)
} else {
None
}
}
}
impl<F> ops::Drop for RunOnceInternal<F>
{
#[inline]
fn drop(&mut self) {
if mem::needs_drop::<F>() {
if let Self::Live(func) = self {
unsafe { ManuallyDrop::drop(func) };
}
}
}
}
/// Holds a 0 argument closure that will only be ran *once*.
#[derive(Debug, Clone)]
pub struct RunOnce<F, T>(PhantomData<fn () -> T>, RunOnceInternal<F>);
unsafe impl<T, F> Send for RunOnce<F, T>
where F: FnOnce() -> T + Send {}
impl<F, T> RunOnce<F, T>
where F: FnOnce() -> T
{
pub const fn new(func: F) -> Self
{
Self(PhantomData, RunOnceInternal::Live(ManuallyDrop::new(func)))
}
pub const fn never() -> Self
{
Self(PhantomData, RunOnceInternal::Dead)
}
#[inline]
pub fn try_take(&mut self) -> Option<F>
{
match &mut self.1 {
RunOnceInternal::Live(func) => {
Some(unsafe { ManuallyDrop::take(func) })
},
_ => None
}
}
#[inline]
pub fn try_run(&mut self) -> Option<T>
{
self.try_take().map(|func| func())
}
#[inline]
pub fn run(mut self) -> T
{
self.try_run().expect("Function has already been consumed")
}
#[inline]
pub fn take(mut self) -> F
{
self.try_take().expect("Function has already been consumed")
}
#[inline]
pub fn is_runnable(&self) -> bool
{
if let RunOnceInternal::Dead = &self.1 {
false
} else {
true
}
}
}
#[inline(always)]
pub(crate) fn map_bool<T>(ok: bool, value: T) -> T
where T: Default
{
if ok {
value
} else {
T::default()
}
}
pub trait SealExt
{
fn try_seal(&self, shrink: bool, grow: bool, write: bool) -> io::Result<()>;
#[inline]
fn sealed(self, shrink: bool, grow: bool, write: bool) -> Self
where Self: Sized {
if let Err(e) = self.try_seal(shrink, grow, write) {
panic!("Failed to apply seals: {}", io::Error::last_os_error())
}
self
}
}
#[cfg(any(feature="memfile", feature="exec"))]
const _: () = {
impl<T: AsRawFd + ?Sized> SealExt for T
{
#[cfg_attr(feature="logging", instrument(skip(self)))]
fn sealed(self, shrink: bool, grow: bool, write: bool) -> Self
where Self: Sized {
use libc::{
F_SEAL_GROW, F_SEAL_SHRINK, F_SEAL_WRITE,
F_ADD_SEALS,
fcntl
};
let fd = self.as_raw_fd();
if unsafe {
fcntl(fd, F_ADD_SEALS
, map_bool(shrink, F_SEAL_SHRINK)
| map_bool(grow, F_SEAL_GROW)
| map_bool(write, F_SEAL_WRITE))
} < 0 {
panic!("Failed to apply seals to file descriptor {fd}: {}", io::Error::last_os_error())
}
self
}
#[cfg_attr(feature="logging", instrument(skip(self), err))]
fn try_seal(&self, shrink: bool, grow: bool, write: bool) -> io::Result<()> {
use libc::{
F_SEAL_GROW, F_SEAL_SHRINK, F_SEAL_WRITE,
F_ADD_SEALS,
fcntl
};
let fd = self.as_raw_fd();
if unsafe {
fcntl(fd, F_ADD_SEALS
, map_bool(shrink, F_SEAL_SHRINK)
| map_bool(grow, F_SEAL_GROW)
| map_bool(write, F_SEAL_WRITE))
} < 0 {
Err(io::Error::last_os_error())
} else {
Ok(())
}
}
}
};

@ -1,12 +1,11 @@
//#![feature(const_trait_impl)]
#[macro_use] extern crate cfg_if; #[macro_use] extern crate cfg_if;
#[cfg(feature="logging")] #[cfg(feature="logging")]
#[macro_use] extern crate tracing; #[macro_use] extern crate tracing;
#[cfg(feature="memfile")] //#[cfg(feature="memfile")]
#[macro_use] extern crate lazy_static; #[macro_use] extern crate lazy_static;
#[cfg(feature="memfile")]
extern crate stackalloc;
/// Run this statement only if `tracing` is enabled /// Run this statement only if `tracing` is enabled
macro_rules! if_trace { macro_rules! if_trace {
@ -24,6 +23,21 @@ macro_rules! if_trace {
} }
} }
}; };
(true $yes:expr$(; $no:expr)?) => {
{
#[allow(unused_variables)]
{
let val = cfg!(feature="logging");
#[cfg(feature="logging")]
let val = { $yes };
$(
#[cfg(not(feature="logging"))]
let val = { $no };
)?
val
}
}
};
} }
#[cfg(feature="jemalloc")] #[cfg(feature="jemalloc")]
@ -65,12 +79,17 @@ macro_rules! function {
}} }}
} }
mod ext; use ext::*;
mod errors;
mod sys; mod sys;
use sys::{ use sys::{
try_get_size, try_get_size,
tell_file, tell_file,
}; };
#[cfg(feature="exec")]
mod exec;
mod buffers; mod buffers;
use buffers::prelude::*; use buffers::prelude::*;
@ -82,6 +101,63 @@ use bytes::{
BufMut, BufMut,
}; };
/* TODO: Allow `collect -exec <command>` /proc/self/fds/<memfd OR STDOUT_FILENO>, `collect -exec{} <command> {/proc/self/fds/<memfd OR STDOUT_FILENO>} <other args>`
struct Options {
/// If the arguments vector contains `None`, that `None` shall be replaced with the string referring to: If in memfd mode: The `memfd_create()` buffer fd, set to RW, truncated to size, seeked to 0. In this mode, the file will remain open when `ecec` is not `None`, and will instead be returned below, as the mode methods will all be modified to return `Option<Box<dyn ModeReturn + 'static>>::Some(<private struct: impl ModeReturn>)`, which will contain the memfd object so it is dropped *after* the child process has exited. If the mode is *not* memfd, then `STDOUT_FILENO` itself will be used; also set to RW, truncated correctly, and seeked to 0. The return code of this process shall be the return code of the child process once it has terminated.
/// Execution of commands (if passed) **always** happens *after* the copy to `stdout`, but *before* the **close** of `stdout`. If the copy to `stdout` fails, the exec will not be executed regardless of if the mode required is actually using `stdout`.
/// The process shall always wait for the child to terminate before exiting. If the child daemon forks, that fork is not followed, and the process exists anyway.
/// Ideally, A `SIGHUP` handler should be registered, which tells the parent to stop waiting on the child and exit now. TODO: The behaviour of the child is unspecified if this happens. It may be killed, or re-attached to `init`. But the return code of the parent should always be `0` in this case.
exec: Option<(OSString, Vec<Option<OSString>>)>
}
trait ModeReturn: Send {
fn get_fd_path(&self) -> &Path;
}
struct BufferedReturn;
impl ModeReturn for BufferedReturn { fn get_fd_str(&self) -> &OsStr{ static_assert(STDOUT_FILENO == 1); "/proc/self/fds/1" }} /* XXX: In the case where the (compile time) check of STDOUT_FILENO == 0 fails, another boxed struct containing the OSString with the correct path that `impl ModeReturn` can be returned, this path will be removed by the compiler if `STDOUT_FILENO != 1`, allowing for better unboxing analysis. */
*/
mod args;
#[derive(Debug)]
pub struct NoFile(std::convert::Infallible);
impl AsRawFd for NoFile
{
#[inline]
fn as_raw_fd(&self) -> RawFd {
match self.0{}
}
}
trait ModeReturn: Send {
type ExecFile: AsRawFd;
fn get_exec_file(self) -> Option<Self::ExecFile>;
}
impl ModeReturn for () {
type ExecFile = NoFile;
#[inline(always)]
fn get_exec_file(self) -> Option<Self::ExecFile> {
None
}
}
impl ModeReturn for io::Stdout {
type ExecFile = Self;
#[inline(always)]
fn get_exec_file(self) -> Option<Self::ExecFile> {
Some(self)
}
}
impl ModeReturn for std::fs::File {
type ExecFile = Self;
#[inline(always)]
fn get_exec_file(self) -> Option<Self::ExecFile> {
Some(self)
}
}
fn init() -> eyre::Result<()> fn init() -> eyre::Result<()>
{ {
cfg_if!{ if #[cfg(feature="logging")] { cfg_if!{ if #[cfg(feature="logging")] {
@ -132,13 +208,32 @@ fn feature_check() -> eyre::Result<()>
Ok(()) Ok(())
} }
#[inline]
#[cfg_attr(feature="logging", instrument(skip_all, fields(fd = ?file.as_raw_fd())))]
fn try_seal_size<F: AsRawFd + ?Sized>(file: &F) -> eyre::Result<()>
{
//if cfg!(feature="exec") {
if let Err(err) = file.try_seal(true,true,false)
.with_section(|| format!("Raw file descriptor: {}", file.as_raw_fd()).header("Attempted seal was on"))
.with_warning(|| "This may cause consumers of -exec{} to misbehave") {
let fd = file.as_raw_fd();
if_trace!{{
warn!("Failed to seal file descriptor {fd}: {err}\n\t{err:?}");
}}
Err(err).wrap_err("Failed to seal file's length")
} else {
Ok(())
}
//}
}
mod work { mod work {
use super::*; use super::*;
#[cfg_attr(feature="logging", instrument(err))] #[cfg_attr(feature="logging", instrument(err))]
#[inline] #[inline]
pub(super) fn buffered() -> eyre::Result<()> pub(super) fn buffered() -> eyre::Result<impl ModeReturn>
{ {
if_trace!(trace!("strategy: allocated buffer")); if_trace!(info!("strategy: allocated buffer"));
let (bytes, read) = { let (bytes, read) = {
let stdin = io::stdin(); let stdin = io::stdin();
@ -153,8 +248,9 @@ mod work {
}; };
if_trace!(info!("collected {read} from stdin. starting write.")); if_trace!(info!("collected {read} from stdin. starting write."));
let stdout = io::stdout();
let written = let written =
io::copy(&mut (&bytes[..read]).reader() , &mut io::stdout().lock()) io::copy(&mut (&bytes[..read]).reader() , &mut stdout.lock())
.with_section(|| read.header("Bytes read")) .with_section(|| read.header("Bytes read"))
.with_section(|| bytes.len().header("Buffer length (frozen)")) .with_section(|| bytes.len().header("Buffer length (frozen)"))
.with_section(|| format!("{:?}", &bytes[..read]).header("Read Buffer")) .with_section(|| format!("{:?}", &bytes[..read]).header("Read Buffer"))
@ -167,14 +263,14 @@ mod work {
.wrap_err("Writing failed: size mismatch"); .wrap_err("Writing failed: size mismatch");
} }
Ok(()) Ok(stdout)
} }
#[cfg_attr(feature="logging", instrument(err))] #[cfg_attr(feature="logging", instrument(err))]
#[inline] #[inline]
#[cfg(feature="memfile")] #[cfg(feature="memfile")]
//TODO: We should establish a max memory threshold for this to prevent full system OOM: Output a warning message if it exceeeds, say, 70-80% of free memory (not including used by this program (TODO: How do we calculate this efficiently?)), and fail with an error if it exceeds 90% of memory... Or, instead of using free memory as basis of the requirement levels on the max size of the memory file, use max memory? Or just total free memory at the start of program? Or check free memory each time (slow!! probably not this one...). Umm... I think basing it off total memory would be best; perhaps make the percentage levels user-configurable at compile time (and allow the user to set the memory value as opposed to using the total system memory at runtime.) or runtime (compile-time preffered; use that crate that lets us use TOML config files at comptime (find it pretty easy by looking through ~/work's rust projects, I've used it before.)) //TODO: We should establish a max memory threshold for this to prevent full system OOM: Output a warning message if it exceeeds, say, 70-80% of free memory (not including used by this program (TODO: How do we calculate this efficiently?)), and fail with an error if it exceeds 90% of memory... Or, instead of using free memory as basis of the requirement levels on the max size of the memory file, use max memory? Or just total free memory at the start of program? Or check free memory each time (slow!! probably not this one...). Umm... I think basing it off total memory would be best; perhaps make the percentage levels user-configurable at compile time (and allow the user to set the memory value as opposed to using the total system memory at runtime.) or runtime (compile-time preffered; use that crate that lets us use TOML config files at comptime (find it pretty easy by looking through ~/work's rust projects, I've used it before.))
pub(super) fn memfd() -> eyre::Result<()> pub(super) fn memfd() -> eyre::Result<impl ModeReturn>
{ {
const DEFAULT_BUFFER_SIZE: fn () -> Option<std::num::NonZeroUsize> = || { const DEFAULT_BUFFER_SIZE: fn () -> Option<std::num::NonZeroUsize> = || {
cfg_if!{ cfg_if!{
@ -189,7 +285,7 @@ mod work {
} }
}; };
if_trace!(trace!("strategy: mapped memory file")); if_trace!(info!("strategy: mapped memory file"));
use std::borrow::Borrow; use std::borrow::Borrow;
@ -202,12 +298,135 @@ mod work {
.unwrap_or_else(|e| format!("<unknown: {e}>")) .unwrap_or_else(|e| format!("<unknown: {e}>"))
} }
#[cfg_attr(feature="logging", instrument(skip_all, err, fields(i = ?i.as_raw_fd())))]
#[inline]
fn truncate_file<S>(i: impl AsRawFd, to: S) -> eyre::Result<()>
where S: TryInto<u64>,
<S as TryInto<u64>>::Error: EyreError
{
truncate_file_raw(i, to.try_into().wrap_err(eyre!("Size too large"))?)?;
Ok(())
}
fn truncate_file_raw(i: impl AsRawFd, to: impl Into<u64>) -> io::Result<()>
{
use libc::ftruncate;
let fd = i.as_raw_fd();
let to = {
let to = to.into();
#[cfg(feature="logging")]
let span_size_chk = debug_span!("chk_size", size = ?to);
#[cfg(feature="logging")]
let _span = span_size_chk.enter();
if_trace!{
if to > i64::MAX as u64 {
error!("Size too large (over max by {}) (max {})", to - (i64::MAX as u64), i64::MAX);
} else {
trace!("Setting {fd} size to {to}");
}
}
if cfg!(debug_assertions) {
i64::try_from(to).map_err(|_| io::Error::new(io::ErrorKind::InvalidInput, "Size too large for ftruncate() offset"))?
} else {
to as i64
}
};
match unsafe { ftruncate(fd, to) } {
-1 => Err(io::Error::last_os_error()),
_ => Ok(())
}
}
//TODO: How to `ftruncate()` stdout only once... If try_get_size succeeds, we want to do it then. If it doesn't, we want to do it when `stdin` as been consumed an we know the size of the memory-file... `RunOnce` won't work unless we can give it an argument....
#[allow(unused_mut)]
let mut set_stdout_len = {
cfg_if! {
if #[cfg(feature="memfile-size-output")] {
if_trace!(warn!("Feature `memfile-size-output` is not yet stable and will cause crash."));
const STDOUT: memfile::fd::RawFileDescriptor = unsafe { memfile::fd::RawFileDescriptor::new_unchecked(libc::STDOUT_FILENO) }; //TODO: Get this from `std::io::Stdout.as_raw_fd()` instead.
use std::sync::atomic::{self, AtomicUsize};
#[cfg(feature="logging")]
let span_ro = debug_span!("run_once", stdout = ?STDOUT);
static LEN_HOLDER: AtomicUsize = AtomicUsize::new(0);
let mut set_len = RunOnce::new(move || {
#[cfg(feature="logging")]
let _span = span_ro.enter();
let len = LEN_HOLDER.load(atomic::Ordering::Acquire);
if_trace!(debug!("Attempting single `ftruncate()` on `STDOUT_FILENO` -> {len}"));
truncate_file(STDOUT, len)
.wrap_err(eyre!("Failed to set length of stdout ({STDOUT}) to {len}"))
});
move |len: usize| {
#[cfg(feature="logging")]
let span_ssl = info_span!("set_stdout_len", len = ?len);
#[cfg(feature="logging")]
let _span = span_ssl.enter();
if_trace!(trace!("Setting static-stored len for RunOnce"));
LEN_HOLDER.store(len, atomic::Ordering::Release);
if_trace!(trace!("Calling RunOnce for `set_stdout_len`"));
match set_len.try_run() {
Some(result) => result
.with_section(|| len.header("Attempted length set was"))
.with_warning(|| libc::off_t::MAX.header("Max length is"))
.with_note(|| STDOUT.header("STDOUT_FILENO is")),
None => {
if_trace!(warn!("Already called `set_stdout_len()`"));
Ok(())
},
}
}
} else {
|len: usize| -> Result<(), std::convert::Infallible> {
#[cfg(feature="logging")]
let span_ssl = info_span!("set_stdout_len", len = ?len);
#[cfg(feature="logging")]
let _span = span_ssl.enter();
if_trace!(info!("Feature `memfile-size-output` is disabled; ignoring."));
let _ = len;
Ok(())
}
}
}
};
let (mut file, read) = { let (mut file, read) = {
let stdin = io::stdin(); let stdin = io::stdin();
let buffsz = try_get_size(&stdin); let buffsz = try_get_size(&stdin);
if_trace!(debug!("Attempted determining input size: {:?}", buffsz)); if_trace!(debug!("Attempted determining input size: {:?}", buffsz));
let buffsz = buffsz.or_else(DEFAULT_BUFFER_SIZE); let buffsz = if cfg!(feature="memfile-size-output") {
//TODO: XXX: Even if this actually works, is it safe to do this? Won't the consumer try to read `value` bytes before we've written them? Perhaps remove pre-setting entirely...
match buffsz {
y @ Some(ref value) => {
let value = value.get();
set_stdout_len(value).wrap_err("Failed to set stdout len to that of stdin")
.with_section(|| value.header("Stdin len was calculated as"))
.with_warning(|| "This is a pre-setting")?;
y
},
n => n,
}
} else { buffsz }.or_else(DEFAULT_BUFFER_SIZE);
if_trace!(if let Some(buf) = buffsz.as_ref() { if_trace!(if let Some(buf) = buffsz.as_ref() {
trace!("Failed to determine input size: preallocating to {}", buf); trace!("Failed to determine input size: preallocating to {}", buf);
} else { } else {
@ -306,6 +525,18 @@ mod work {
}; };
if_trace!(info!("collected {} from stdin. starting write.", read)); if_trace!(info!("collected {} from stdin. starting write.", read));
// Seal memfile
let _ = try_seal_size(&file);
// Now copy memfile to stdout
// TODO: XXX: Currently causes crash. But if we can get this to work, leaving this in is definitely safe (as opposed to the pre-setting (see above.))
set_stdout_len(read)
.wrap_err(eyre!("Failed to `ftruncate()` stdout after collection of {read} bytes"))
.with_note(|| "Was not pre-set")?;
let written = let written =
io::copy(&mut file, &mut io::stdout().lock()) io::copy(&mut file, &mut io::stdout().lock())
.with_section(|| read.header("Bytes read from stdin")) .with_section(|| read.header("Bytes read from stdin"))
@ -318,25 +549,120 @@ mod work {
.wrap_err("Writing failed: size mismatch"); .wrap_err("Writing failed: size mismatch");
} }
Ok(()) Ok(file)
} }
} }
#[cfg_attr(feature="logging", instrument(err))] #[cfg_attr(feature="logging", instrument(err))]
fn main() -> eyre::Result<()> { #[inline(always)]
unsafe fn close_raw_fileno(fd: RawFd) -> io::Result<()>
{
match libc::close(fd) {
0 => Ok(()),
_ => Err(io::Error::last_os_error()),
}
}
#[inline]
#[cfg_attr(feature="logging", instrument(skip_all, fields(T = ?std::any::type_name::<T>())))]
fn close_fileno<T: IntoRawFd>(fd: T) -> eyre::Result<()>
{
let fd = fd.into_raw_fd();
if fd < 0 {
return Err(eyre!("Invalid fd").with_note(|| format!("fds begin at 0 and end at {}", RawFd::MAX)));
} else {
if_trace!(debug!("closing consumed fd {fd}"));
unsafe {
close_raw_fileno(fd)
}.wrap_err("Failed to close fd")
.with_section(move || fd.header("Fileno was"))
.with_section(|| std::any::type_name::<T>().header(""))
}
}
fn parse_args() -> eyre::Result<args::Options>
{
args::parse_args()
.wrap_err("Parsing arguments failed")
.with_section(|| std::env::args_os().skip(1)
.map(|x| std::borrow::Cow::Owned(format!("{x:?}")))
.join_by_clone(std::borrow::Cow::Borrowed(" ")) //XXX: this can be replaced by `flat_map() -> [x, " "]` really... Dunno which will be faster...
.collect::<String>()
.header("Program arguments (argv+1) were"))
.with_section(|| args::program_name().header("Program name (*argv) was"))
.with_section(|| std::env::args_os().len().header("Total numer of arguments, including program name (argc) was"))
.with_suggestion(|| "Try passing `--help`")
}
#[cfg_attr(feature="logging", instrument(err))]
fn main() -> errors::DispersedResult<()> {
init()?; init()?;
feature_check()?; feature_check()?;
if_trace!(debug!("initialised")); if_trace!(debug!("initialised"));
//TODO: How to cleanly feature-gate `args`?
let opt = { cfg_if!{
if #[cfg(feature="exec")] {
#[cfg(feature="logging")]
let _span = debug_span!("args");
#[cfg(feature="logging")]
let _in_span = _span.enter();
let parsed = parse_args()?;
if_trace!(debug!("Parsed arguments: {parsed:?}"));
parsed
} else {
()
}
} };
//TODO: maybe look into fd SEALing? Maybe we can prevent a consumer process from reading from stdout until we've finished the transfer. The name SEAL sounds like it might have something to do with that?
let execfile;
cfg_if!{ cfg_if!{
if #[cfg(feature="memfile")] { if #[cfg(feature="memfile")] {
work::memfd() execfile = work::memfd()
.wrap_err("Operation failed").with_note(|| "Stragery was `memfd`")?; .wrap_err("Operation failed").with_note(|| "Stragery was `memfd`")?;
} else { } else {
work::buffered() execfile = work::buffered()
.wrap_err("Operation failed").with_note(|| "Strategy was `buffered`")?; .wrap_err("Operation failed").with_note(|| "Strategy was `buffered`")?;
} }
} }
// Transfer complete, run exec if enabled
let rc = { cfg_if! {
if #[cfg(feature="exec")] {
let rc = if let Some(file) = execfile.get_exec_file() {
exec::spawn_from_sync(&file, opt).into_iter().try_fold(0i32, |opt, res| res.map(|x| opt | x.unwrap_or(0)))
} else {
if_trace!(debug!("there is no file to apply potential -exec/{{}} to"));
Ok(0i32)
}.wrap_err("-exec/{} operations failed")?;
if_trace!(match rc {
0 => trace!("-exec/{{}} operation(s all) returned 0 exit status"),
n => error!("-exec/{{}} operation(s) returned non-zero exit code (total: {}) or were killed by signal", n),
});
rc
} else {
0i32
}
} };
// Now that transfer is complete from buffer to `stdout`, close `stdout` pipe before exiting process.
if_trace!(info!("Transfer complete, closing `stdout` pipe"));
{
let stdout_fd = libc::STDOUT_FILENO; // (io::Stdout does not impl `IntoRawFd`, just use the raw fd directly; using the constant from libc may help in weird cases where STDOUT_FILENO is not 1...)
debug_assert_eq!(stdout_fd, std::io::stdout().as_raw_fd(), "STDOUT_FILENO and io::stdout().as_raw_fd() are not returning the same value.");
close_fileno(/*std::io::stdout().as_raw_fd()*/ stdout_fd) // SAFETY: We just assume fd 1 is still open. If it's not (i.e. already been closed), this will return error.
.with_section(move || stdout_fd.header("Attempted to close this fd (STDOUT_FILENO)"))
.with_warning(|| format!("It is possible fd {} (STDOUT_FILENO) has already been closed; if so, look for where that happens and prevent it. `stdout` should be closed here.", stdout_fd).header("Possible bug"))
}.wrap_err(eyre!("Failed to close stdout"))?;
if rc != 0 {
if cfg!(feature="exec") {
if_trace!(error!("Exiting with non-zero code due to child(s) returning non-zero exit status")); //TODO: A runtime flag to disable this? TODO: Also, a flag to stop printing to stdout so consumers of output can use just `-exec/{}` child process `stdout`s is enabled
}
std::process::exit(rc);
}
Ok(()) Ok(())
} }

@ -16,7 +16,7 @@ use std::{
pub mod fd; pub mod fd;
pub mod error; pub mod error;
mod map; mod map;
//TODO: #[cfg(feature="hugetlb")] #[cfg(feature="hugetlb")]
mod hp; mod hp;
@ -110,7 +110,7 @@ impl RawFile
} }
#[inline(always)] #[inline(always)]
const fn take_ownership_of_unchecked(fd: RawFd) -> Self pub(crate) const fn take_ownership_of_unchecked(fd: RawFd) -> Self
{ {
//! **Internal**: Non-`unsafe` and `const` version of `take_ownership_of_raw_unchecked()` //! **Internal**: Non-`unsafe` and `const` version of `take_ownership_of_raw_unchecked()`
//! : assumes `fd` is `>= 0` //! : assumes `fd` is `>= 0`
@ -279,13 +279,53 @@ impl RawFile
opt.borrow().open(path).map(Into::into) opt.borrow().open(path).map(Into::into)
} }
/// Open a new in-memory (W+R) file with an optional name and a fixed size. /// Allocates `size` bytes for this file.
///
/// # Note
/// This does not *extend* the file's capacity, it is instead similar to `fs::File::set_len()`.
#[cfg_attr(feature="logging", instrument(err))]
#[inline]
pub fn allocate_size(&mut self, size: u64) -> io::Result<()>
{
use libc::{ fallocate, off_t};
if_trace!(trace!("attempting fallocate({}, 0, 0, {size}) (max offset: {})", self.0.get(), off_t::MAX));
match unsafe { fallocate(self.0.get(), 0, 0, if cfg!(debug_assertions) {
size.try_into().map_err(|_| io::Error::new(io::ErrorKind::InvalidInput, "Offset larger than max offset size"))?
} else { size as off_t }) } { //XXX is this biteise AND check needed? fallocate() should already error if the size is negative with these parameters, no?
-1 => Err(io::Error::last_os_error()),
_ => Ok(())
}
}
/// Sets the size of this file.
///
/// The only real difference is that this will work on a `hugetlbfs` file, whereas `allocate_size()` will not.
/// # Note
/// This is essentially `fs::File::set_len()`.
#[cfg_attr(feature="logging", instrument(err))] #[cfg_attr(feature="logging", instrument(err))]
#[inline]
pub fn truncate_size(&mut self, size: u64) -> io::Result<()>
{
use libc::{ ftruncate, off_t};
if_trace!(trace!("attempting ftruncate({}, {size}) (max offset: {})", self.0.get(), off_t::MAX));
match unsafe { ftruncate(self.0.get(), if cfg!(debug_assertions) {
size.try_into().map_err(|_| io::Error::new(io::ErrorKind::InvalidInput, "Offset larger than max offset size"))?
} else { size as off_t }) } {
-1 => Err(io::Error::last_os_error()),
_ => Ok(())
}
}
/// Open a new in-memory (W+R) file with an optional name and a fixed size.
#[cfg_attr(feature="logging", instrument(level="debug", skip_all, err))]
pub fn open_mem(name: Option<&str>, len: usize) -> Result<Self, error::MemfileError> pub fn open_mem(name: Option<&str>, len: usize) -> Result<Self, error::MemfileError>
{ {
use std::{
ffi::CString,
borrow::Cow,
};
lazy_static! { lazy_static! {
static ref DEFAULT_NAME: String = format!(concat!("<memfile@", file!(), "->", "{}", ":", line!(), "-", column!(), ">"), function!()); //TODO: If it turns out memfd_create() requires an `&'static str`; remove the use of stackalloc, and have this variable be a nul-terminated CString instead. static ref DEFAULT_NAME: CString = CString::new(format!(concat!("<memfile@", file!(), "->", "{}", ":", line!(), "-", column!(), ">"), function!())).unwrap();
} }
use libc::{ use libc::{
@ -294,13 +334,13 @@ impl RawFile
}; };
use error::MemfileCreationStep::*; use error::MemfileCreationStep::*;
let rname = name.unwrap_or(&DEFAULT_NAME); let bname: Cow<CString> = match name {
Some(s) => Cow::Owned(CString::new(Vec::from(s)).expect("Invalid name")),
None => Cow::Borrowed(&DEFAULT_NAME),
};
stackalloc::alloca_zeroed(rname.len()+1, move |bname| { //XXX: Isn't the whole point of making `name` `&'static` that I don't know if `memfd_create()` requires static-lifetime name strings? TODO: Check this let bname = bname.as_bytes_with_nul();
#[cfg(feature="logging")] if_trace!(trace!("created nul-terminated buffer for name `{:?}': ({})", std::str::from_utf8(bname), bname.len()));
let _span = info_span!("stack_name_cpy", size = bname.len());
#[cfg(feature="logging")]
let _span_lock = _span.enter();
macro_rules! attempt_call macro_rules! attempt_call
{ {
@ -318,18 +358,15 @@ impl RawFile
} }
} }
if_trace!(trace!("copying {rname:p} `{rname}' (sz: {}) -> nul-terminated {:p}", rname.len(), bname)); let fd = attempt_call!(-1, memfd_create(bname.as_ptr() as *const _, MEMFD_CREATE_FLAGS), Create(name.map(str::to_owned), MEMFD_CREATE_FLAGS))
let bname = {
unsafe {
std::ptr::copy_nonoverlapping(rname.as_ptr(), bname.as_mut_ptr(), rname.len());
}
debug_assert_eq!(bname[rname.len()], 0, "Copied name string not null-terminated?");
bname.as_ptr()
};
let fd = attempt_call!(-1, memfd_create(bname as *const _, MEMFD_CREATE_FLAGS), Create(name.map(str::to_owned), MEMFD_CREATE_FLAGS))
.map(Self::take_ownership_of_unchecked)?; // Ensures `fd` is dropped if any subsequent calls fail .map(Self::take_ownership_of_unchecked)?; // Ensures `fd` is dropped if any subsequent calls fail
#[cfg(feature="logging")]
let using_memfile = debug_span!("setup_memfd", fd = ?fd.0.get());
{
#[cfg(feature="logging")]
let _span = using_memfile.enter();
if len > 0 { if len > 0 {
attempt_call!(-1 attempt_call!(-1
, fallocate(fd.0.get(), 0, 0, len.try_into() , fallocate(fd.0.get(), 0, 0, len.try_into()
@ -343,14 +380,14 @@ impl RawFile
, io::Error::last_os_error()) , io::Error::last_os_error())
.expect("Failed to check seek position in fd") .expect("Failed to check seek position in fd")
, 0, "memfd seek position is non-zero after fallocate()"); , 0, "memfd seek position is non-zero after fallocate()");
if_trace!(if seeked != 0 { warn!("Trace offset is non-zero: {seeked}") } else { trace!("Trace offset verified ok") }); if_trace!(if seeked != 0 { warn!("Seek offset is non-zero: {seeked}") } else { trace!("Seek offset verified ok") });
} }
} else { } else {
if_trace!(trace!("No length provided, skipping fallocate() call")); if_trace!(trace!("No length provided, skipping fallocate() call"));
} }
}
Ok(fd) Ok(fd)
})
} }
} }
@ -479,9 +516,11 @@ mod tests
file.seek(SeekFrom::Start(0))?; file.seek(SeekFrom::Start(0))?;
file file
}; };
let v: Vec<u8> = stackalloc::alloca_zeroed(STRING.len(), |buf| { let v = {
file.read_exact(buf).map(|_| buf.into()) let mut buf = vec![0; STRING.len()];
})?; file.read_exact(&mut buf[..])?;
buf
};
assert_eq!(v.len(), STRING.len(), "Invalid read size."); assert_eq!(v.len(), STRING.len(), "Invalid read size.");
assert_eq!(&v[..], &STRING[..], "Invalid read data."); assert_eq!(&v[..], &STRING[..], "Invalid read data.");

@ -109,10 +109,19 @@ impl fmt::Display for BadFDError
pub type FileNo = RawFd; pub type FileNo = RawFd;
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] #[derive(Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
#[repr(transparent)] #[repr(transparent)]
pub struct RawFileDescriptor(NonNegativeI32); pub struct RawFileDescriptor(NonNegativeI32);
impl fmt::Debug for RawFileDescriptor
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
write!(f, "RawFileDescriptor({})", self.0.get())
}
}
impl RawFileDescriptor impl RawFileDescriptor
{ {
pub const STDIN: Self = Self(unsafe { NonNegativeI32::new_unchecked(0) }); pub const STDIN: Self = Self(unsafe { NonNegativeI32::new_unchecked(0) });

@ -13,8 +13,11 @@
use super::*; use super::*;
use std::{ use std::{
path::Path, path::Path,
ops,
fmt,
}; };
use libc::{ use libc::{
c_uint, c_int,
MFD_HUGETLB, MFD_HUGETLB,
MAP_HUGE_SHIFT, MAP_HUGE_SHIFT,
}; };
@ -25,21 +28,652 @@ use libc::{
/// The contents of those subdirectories themselves are irrelevent for our purpose. /// The contents of those subdirectories themselves are irrelevent for our purpose.
pub const HUGEPAGE_SIZES_LOCATION: &'static str = "/sys/kernel/mm/hugepages"; pub const HUGEPAGE_SIZES_LOCATION: &'static str = "/sys/kernel/mm/hugepages";
/// Should creation of `Mask`s from extracted kernel information be subject to integer conversion checks?
///
/// This is `true` on debug builds or if the feature `hugepage-checked-masks` is enabled.
const CHECKED_MASK_CREATION: bool = if cfg!(feature="hugepage-checked-masks") || cfg!(debug_assertions) { true } else { false };
/// Find all `Mask`s defined within this specific directory.
///
/// This is usually only useful when passed `HUGEPAGE_SIZES_LOCATION` unless doing something funky with it.
/// For most use-cases, `get_masks()` should be fine.
#[cfg_attr(feature="logging", instrument(err, skip_all, fields(path = ?path.as_ref())))]
#[inline]
pub fn get_masks_in<P>(path: P) -> eyre::Result<impl Iterator<Item=eyre::Result<SizedMask>> + 'static>
where P: AsRef<Path>
{
let path = path.as_ref();
let root_path = {
let path = path.to_owned();
move || path
};
let root_path_section = {
let root_path = root_path.clone();
move ||
root_path().to_string_lossy().into_owned().header("Root path was")
};
let dir = path.read_dir()
.wrap_err(eyre!("Failed to enumerate directory")
.with_section(root_path_section.clone()))?;
Ok(dir
.map(|x| x.map(|n| n.file_name()))
.map(|name| name.map(|name| (find_size_bytes(&name), name)))
.map(move |result| match result {
Ok((Some(ok), path)) => {
if CHECKED_MASK_CREATION {
Mask::new_checked(ok)
.wrap_err(eyre!("Failed to create mask from extracted bytes")
.with_section(|| ok.header("Bytes were"))
.with_section(move || format!("{path:?}").header("Checked path was"))
.with_section(root_path_section.clone()))
.and_then(|mask| Ok(SizedMask{mask, size: ok.try_into().wrap_err("Size was larger than `u64`")?}))
// .map(|mask| -> eyre::Result<_> { Ok(SizedMask{mask, size: ok.try_into()?}) })
} else {
Ok(SizedMask{ mask: Mask::new(ok), size: ok as u64 })
}
},
Ok((None, path)) => Err(eyre!("Failed to extract bytes from path"))
.with_section(move || format!("{path:?}").header("Checked path was"))
.with_section(root_path_section.clone()),
Err(e) => Err(e).wrap_err(eyre!("Failed to read path from which to extract bytes")
.with_section(root_path_section.clone()))
}))
}
/// Find all `Mask`s on this system.
#[cfg_attr(feature="logging", instrument(level="trace"))]
#[inline]
pub fn get_masks() -> eyre::Result<impl Iterator<Item=eyre::Result<SizedMask>> + 'static>
{
get_masks_in(HUGEPAGE_SIZES_LOCATION)
}
/// A huge-page mask that can be bitwise OR'd with `HUGETLB_MASK`, but retains the size of that huge-page.
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, Copy)]
pub struct SizedMask
{
mask: Mask,
size: u64,
}
impl SizedMask
{
#[inline]
pub const fn size(&self) -> u64
{
self.size
}
#[inline]
pub const fn as_mask(&self) -> &Mask
{
&self.mask
}
}
impl std::borrow::Borrow<Mask> for SizedMask
{
#[inline]
fn borrow(&self) -> &Mask {
&self.mask
}
}
impl std::ops::Deref for SizedMask
{
type Target = Mask;
#[inline]
fn deref(&self) -> &Self::Target {
&self.mask
}
}
impl From<SizedMask> for Mask
{
#[inline]
fn from(from: SizedMask) -> Self
{
from.mask
}
}
/// A huge-page mask that can be bitwise OR'd with `HUGETLB_MASK`.
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, Copy)]
#[repr(transparent)]
pub struct Mask(c_uint);
/// `Mask` and `SizedMask` trait impls
const _:() = {
macro_rules! mask_impls {
($name:ident) => {
impl fmt::Display for $name
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
write!(f, "{}", self.raw())
}
}
impl fmt::LowerHex for $name
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
write!(f, "0x{:x}", self.raw())
}
}
impl fmt::UpperHex for $name
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
write!(f, "0x{:X}", self.raw())
}
}
impl fmt::Binary for $name
{
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
{
write!(f, "0b{:b}", self.raw())
}
}
// Comparisons
impl PartialEq<c_uint> for $name
{
#[inline]
fn eq(&self, &other: &c_uint) -> bool
{
self.mask() == other
}
}
impl PartialEq<c_int> for $name
{
#[inline]
fn eq(&self, &other: &c_int) -> bool
{
self.raw() == other
}
}
};
}
mask_impls!(Mask);
mask_impls!(SizedMask);
impl ops::BitOr<c_uint> for Mask
{
type Output= c_uint;
#[inline]
fn bitor(self, rhs: c_uint) -> Self::Output {
self.mask() | rhs
}
}
impl ops::BitOr for Mask
{
type Output= Self;
#[inline]
fn bitor(self, rhs: Self) -> Self::Output {
Self(self.0 | rhs.0)
}
}
impl ops::BitOrAssign for Mask
{
#[inline]
fn bitor_assign(&mut self, rhs: Self) {
self.0 |= rhs.0;
}
}
};
#[inline]
const fn log2_usize(x: usize) -> usize {
const BITS: usize = std::mem::size_of::<usize>() * 8usize; //XXX Is this okay to be hardcoded? I can't find CHAR_BIT in core, so...
BITS - (x.leading_zeros() as usize) - 1
}
impl Mask {
/// The shift mask used to calculate huge-page masks
pub const SHIFT: c_int = MAP_HUGE_SHIFT;
/// The raw bitmask applied to make the `MAP_HUGE_` mask available via `raw()` valid for `memfd_create()` in `mask()`
pub const HUGETLB_MASK: c_uint = MFD_HUGETLB;
#[cfg_attr(feature="logging", instrument(level="debug", err))]
#[inline]
pub fn new_checked(bytes: usize) -> eyre::Result<Self>
{
Ok(Self(c_uint::try_from(log2_usize(bytes))?
.checked_shl(Self::SHIFT as u32).ok_or(eyre!("Left shift would overflow"))?))
}
/// Create a new mask from a number of bytes.
///
/// This is unchecked and may overflow if the number of bytes is so large (in which case, there is likely a bug), for a checked version, use `new_checked()`.
#[inline]
pub const fn new(bytes: usize) -> Self
{
Self((log2_usize(bytes) as c_uint) << Self::SHIFT)
}
/// Create from a raw `MAP_HUGE_` mask.
///
/// # Safety
/// The caller **must** guarantee that `mask` is a valid `MAP_HUGE_` mask.
#[inline]
pub const unsafe fn from_raw(mask: c_uint) -> Self
{
Self(mask)
}
/// Get the raw `MAP_HUGE_` mask.
#[inline]
pub const fn raw(self) -> c_int
{
self.0 as c_int
}
/// Get a HUGETLB mask suitable for `memfd_create()` from this value.
#[inline]
pub const fn mask(self) -> c_uint
{
(self.raw() as c_uint) | Self::HUGETLB_MASK
}
/// Create a function that acts as `memfd_create()` with *only* this mask applied to it.
///
/// The `flags` argument is erased. To pass arbitrary flags to `memfd_create()`, use `memfd_create_raw_wrapper_flags()`
#[inline(always)]
pub const fn memfd_create_raw_wrapper(self) -> impl Fn (*const libc::c_char) -> c_int
{
use libc::memfd_create;
move |path| {
unsafe {
memfd_create(path, self.mask())
}
}
}
/// Create a function that acts as `memfd_create()` with this mask applied to it.
#[inline(always)]
pub const fn memfd_create_raw_wrapper_flags(self) -> impl Fn (*const libc::c_char, c_uint) -> c_int
{
use libc::memfd_create;
move |path, flag| {
unsafe {
memfd_create(path, flag | self.mask())
}
}
}
/// Create a function that acts as safe `memfd_create()` wrapper with this mask applied to it.
///
/// The `flags` argument is erased. To pass arbitrary flags to `memfd_create()`, use `memfd_create_wrapper_flags()`
/// # Returns
/// A RAII-guarded wrapper over the memory-file, or the `errno` in an `Err(io::Error)` if the operation failed.
#[inline]
pub const fn memfd_create_wrapper(self) -> impl Fn(*const libc::c_char) -> io::Result<super::RawFile>
{
let memfd_create = self.memfd_create_raw_wrapper();
move |path| {
match memfd_create(path) {
-1 => Err(io::Error::last_os_error()),
fd => Ok(super::RawFile::take_ownership_of_unchecked(fd))
}
}
}
/// Create a function that acts as safe `memfd_create()` wrapper with this mask applied to it.
/// # Returns
/// A RAII-guarded wrapper over the memory-file, or the `errno` in an `Err(io::Error)` if the operation failed.
#[inline]
pub const fn memfd_create_wrapper_flags(self) -> impl Fn(*const libc::c_char, c_uint) -> io::Result<super::RawFile>
{
let memfd_create = self.memfd_create_raw_wrapper_flags();
move |path, flags| {
match memfd_create(path, flags) {
-1 => Err(io::Error::last_os_error()),
fd => Ok(super::RawFile::take_ownership_of_unchecked(fd))
}
}
}
}
impl TryFrom<usize> for Mask
{
type Error = eyre::Report;
#[cfg_attr(feature="logging", instrument(level="trace", skip_all))]
#[inline(always)]
fn try_from(from: usize) -> Result<Self, Self::Error>
{
Self::new_checked(from)
}
}
//TODO: add test `.memfd_create_wrapper{,_flags}()` usage, too with some `MAP_HUGE_` constants as sizes
/// Take a directory path and try to parse the hugepage size from it. /// Take a directory path and try to parse the hugepage size from it.
/// ///
/// All subdirectories from `HUGEPAGE_SIZES_LOCATION` should be passed to this, and the correct system-valid hugepage size will be returned for that specific hugepage. /// All subdirectories from `HUGEPAGE_SIZES_LOCATION` should be passed to this, and the correct system-valid hugepage size will be returned for that specific hugepage.
//TODO: Maybe make this `Result` instead? So we can know what part of the lookup is failing?
#[cfg_attr(feature="logging", instrument(ret, skip_all, fields(path = ?path.as_ref())))]
fn find_size_bytes(path: impl AsRef<Path>) -> Option<usize> fn find_size_bytes(path: impl AsRef<Path>) -> Option<usize>
{ {
const KMAP_TAGS: &[u8] = b"kmgbB"; //"bB";
const KMAP_SIZES: &[usize] = &[1024, 1024*1024, 1024*1024*1024, 0, 0]; // Having `b` and `B` (i.e. single bytes) be 0 means `sz` will be passed unmodified: the default multiplier is 1 and never 0. Making these two values 0 instead of 1 saves a single useless `* 1` call, but still allows for them to be looked up in `dir_bytes` when finding `k_loc` below.
/// Lookup the correct multiplier for `sz` to get the number of individual bytes from the IEC bytes-suffix `chr`.
/// Then, return the number of individual bytes of `sz` multiplied by the appropriate number dictated by the IEC suffix `chr` (if there is one.)
///
/// The lookup table is generated at compile-time from the constants `KMAP_TAGS` and `KMAP_SIZES`.
/// The indecies of `KMAP_TAGS` of `KMAP_SIZES` should should correspond the suffix to the multiplier.
/// If a suffix is not found, a default multipler of `1` is used (i.e. `sz` is returned un-multipled.)
///
/// # Examples
/// * Where `sz = 10`, and `chr = b'k'` -> `10 * 1024` -> `10240` bytes (10 kB)
/// * Where `sz = 100`, and `chr = b'B'` -> `100` -> `100` bytes (100 bytes)
const fn kmap_lookup(sz: usize, chr: u8) -> usize {
const fn gen_kmap(tags: &[u8], sizes: &[usize]) -> [Option<NonZeroUsize>; 256] {
let mut output = [None; 256];
let mut i=0;
let len = if tags.len() < sizes.len() { tags.len() } else { sizes.len() };
while i < len {
output[tags[i] as usize] = NonZeroUsize::new(sizes[i]);
i += 1;
}
output
}
const KMAP: [Option<NonZeroUsize>; 256] = gen_kmap(KMAP_TAGS, KMAP_SIZES);
match KMAP[chr as usize] {
Some(mul) => sz * mul.get(),
None => sz,
}
}
let path= path.as_ref(); let path= path.as_ref();
if !path.is_dir() { /*if !path.is_dir() {
// These don't count as directories for some reason
return None; return None;
} } */
let dir_name = path.file_name()?; let dir_name = path.file_name()?;
// location of the `-` in the dir name let dir_bytes = dir_name.as_bytes();
let split_loc = memchr::memchr(b'-', dir_name.as_bytes())?; if_trace!(trace!("dir_name: {dir_name:?}"));
// location of the b'-' in the dir name
let split_loc = memchr::memchr(b'-', dir_bytes)?;
// The rest of the string including the b'-' seperator. (i.e. '-(\d+)kB$')
let split_bytes = &dir_bytes[split_loc..];
if_trace!(debug!("split_bytes (from `-'): {:?}", std::ffi::OsStr::from_bytes(split_bytes)));
// location of the IEC tag (in `KMAP_TAGS`, expected to be b'k') after the number of kilobytes
let (k_loc, k_chr) = 'lookup: loop {
for &tag in KMAP_TAGS {
if_trace!(trace!("attempting check for `{}' ({tag}) in {split_bytes:?}", tag as char));
if let Some(k_loc) = memchr::memchr(tag, split_bytes) {
break 'lookup (k_loc, tag);
} else {
if_trace!(warn!("lookup failed"));
continue 'lookup;
}
}
// No suffixes in `KMAP_TAGS` found.
if_trace!(error!("No appropriate suffix ({}) found in {:?}", unsafe { std::str::from_utf8_unchecked(KMAP_TAGS) }, split_bytes));
return None;
};
// The number of kilobytes in this hugepage as a base-10 string
let kb_str = {
let kb_str = &split_bytes[..k_loc];// &dir_bytes[split_loc..k_loc];
if_trace!(trace!("kb_str (raw): {:?}", std::ffi::OsStr::from_bytes(kb_str)));
if kb_str.len() <= 1 {
// There is no number between the digits and the `kB` (unlikely)
if_trace!(error!("Invalid format of hugepage kB size in pathname `{:?}': Extracted string was `{}'", dir_name, String::from_utf8_lossy(kb_str)));
return None;
}
match std::str::from_utf8(&kb_str[1..]) {
Ok(v) => v,
Err(e) => {
if_trace!(error!("Kilobyte string number (base-10) in pathname `{:?}' is not valid utf8: {e}", kb_str));
drop(e);
return None;
}
}
};
if_trace!(debug!("kb_str (extracted): {kb_str}"));
//TODO: find the `k` (XXX: Is it always in kB? Or do we have to find the last non-digit byte instead?) For now, we can just memchr('k') I think -- look into kernel spec for this later. kb_str.parse::<usize>().ok().map(move |sz| {
if_trace!(debug!("found raw size {sz}, looking up in table for byte result of suffix `{}'", k_chr as char));
kmap_lookup(sz, k_chr)
})
}
#[cfg(test)]
mod tests
{
use super::*;
#[inline]
fn get_bytes<'a, P: 'a>(from: P) -> eyre::Result<impl Iterator<Item=eyre::Result<usize>> +'a>
where P: AsRef<Path>
{
let dir = from.as_ref().read_dir()?;
Ok(dir
.map(|x| x.map(|n| n.file_name()))
.map(|name| name.map(|name| super::find_size_bytes(name)))
.map(|result| result.flatten()))
}
#[test]
fn find_size_bytes() -> eyre::Result<()>
{
//crate::init()?; XXX: Make `find_size_bytes` return eyre::Result<usize> instead of Option<usize>
let dir = Path::new(super::HUGEPAGE_SIZES_LOCATION).read_dir()?;
for result in dir
.map(|x| x.map(|n| n.file_name()))
.map(|name| name.map(|name| super::find_size_bytes(name)))
{
println!("size: {}", result
.wrap_err(eyre!("Failed to extract name"))?
.ok_or(eyre!("Failed to find size"))?);
}
Ok(())
}
mod map_huge {
use super::*;
/// Some `MAP_HUGE_` constants provided by libc.
const CONSTANTS: &[c_int] = &[
libc::MAP_HUGE_1GB,
libc::MAP_HUGE_1MB,
libc::MAP_HUGE_2MB,
];
#[inline]
fn find_constants_from<'a, I, M>(masks: I) -> impl Iterator<Item=c_int> + 'a
where I: IntoIterator<Item = M> + 'a,
M: PartialEq<c_int> + 'a
{
#[inline]
fn slow_contains(m: &impl PartialEq<c_int>) -> Option<c_int>
{
for c in CONSTANTS {
if m == c {
return Some(*c);
}
}
None None
}
masks.into_iter().filter_map(|mask| slow_contains(&mask))
}
#[inline]
fn find_constants_in(path: impl AsRef<Path>, checked: bool) -> eyre::Result<usize>
{
let mut ok = 0usize;
for bytes in get_bytes(path)? {
let bytes = bytes?;
let flag = if checked {
super::Mask::new_checked(bytes)
.wrap_err(eyre!("Failed to create mask from bytes").with_section(|| bytes.header("Number of bytes was")))?
} else {
super::Mask::new(bytes)
};
if CONSTANTS.contains(&flag.raw()) {
println!("Found pre-set MAP_HUGE_ flag: {flag:X} ({flag:b}, {bytes} bytes)");
ok +=1;
}
}
Ok(ok)
}
#[test]
fn find_map_huge_flags_checked() -> eyre::Result<()>
{
eprintln!("Test array contains flags: {:#?}", CONSTANTS.iter().map(|x| format!("0x{x:X} (0b{x:b})")).collect::<Vec<String>>());
let ok = find_constants_in(super::HUGEPAGE_SIZES_LOCATION, true).wrap_err("Failed to find constants (checked mask creation)")?;
if ok>0 {
println!("Found {ok} / {} of test flags set.", CONSTANTS.len());
Ok(())
} else {
println!("Found none of the test flags set...");
Err(eyre!("Failed to find any matching map flags in test array of `MAP_HUGE_` flags: {:?}", CONSTANTS))
}
}
#[test]
fn find_map_huge_flags() -> eyre::Result<()>
{
eprintln!("Test array contains flags: {:#?}", CONSTANTS.iter().map(|x| format!("0x{x:X} (0b{x:b})")).collect::<Vec<String>>());
let ok = find_constants_in(super::HUGEPAGE_SIZES_LOCATION, false).wrap_err("Failed to find constants (unchecked mask creation)")?;
if ok>0 {
println!("Found {ok} / {} of test flags set.", CONSTANTS.len());
Ok(())
} else {
println!("Found none of the test flags set...");
Err(eyre!("Failed to find any matching map flags in test array of `MAP_HUGE_` flags: {:?}", CONSTANTS))
}
}
#[test]
fn get_masks_matching_constants() -> eyre::Result<()>
{
let masks: usize = find_constants_from(super::get_masks()?
.inspect(|mask| match mask {
Ok(mask) => eprintln!(" -> mask {mask:x} ({mask:b})"),
Err(e) => eprintln!(" ! failed extraction: {e}")
}).filter_map(Result::ok))
.count();
(masks > 0).then(|| drop(println!("Found {masks} masks matching pre-set constants"))).ok_or(eyre!("Found no masks matching constants"))
}
#[test]
fn get_masks() -> eyre::Result<()>
{
let masks: usize = super::get_masks()?
.inspect(|mask| match mask {
Ok(mask) => eprintln!(" -> mask {mask:x} ({mask:b})"),
Err(e) => eprintln!(" ! failed extraction: {e}")
})
.count();
(masks > 0).then(|| drop(println!("Found {masks} masks on system"))).ok_or(eyre!("Found no masks"))
}
#[test]
fn hugetlb_truncate_succeeds() -> eyre::Result<()>
{
/// XXX: Temporary alias until we have a owning `mmap()`'d fd data-structure that impl's `From<impl IntoRawFd>`
type MappedFile = fs::File;
use std::ffi::CString;
let name = CString::new(Vec::from_iter(b"memfd_create_wrapper() test".into_iter().copied())).unwrap();
let mask = super::get_masks()?.next().ok_or(eyre!("No masks found"))?.wrap_err("Failed to extract mask")?;
eprintln!("Using mask: {mask:x} ({mask:b})");
let create = mask.memfd_create_wrapper_flags();
let file: MappedFile = {
let mut file = create(name.as_ptr(), super::MEMFD_CREATE_FLAGS).wrap_err(eyre!("Failed to create file"))?;
println!("Created file {file:?}, attempting ftruncate({})", mask.size());
// XXX: Note: `fallocate()` fails on hugetlb files, but `ftruncate()` does not.
file.truncate_size(mask.size()).wrap_err(eyre!("ftruncate() failed"))?;
println!("Set file-size to {}", mask.size());
file
}.into();
drop(file); //TODO: `mmap()` file to `mask.size()`.
Ok(())
}
#[test]
#[should_panic]
// TODO: `write()` syscall is not allowed here. Try to come up with an example that uses `splice()` and `send_file()`.
fn hugetlb_write_fails()
{
fn _hugetlb_write_fails() -> eyre::Result<()> {
//crate::init()?;
use std::ffi::CString;
let name = CString::new(Vec::from_iter(b"memfd_create_wrapper() test".into_iter().copied())).unwrap();
let mask = super::get_masks()?.next().ok_or(eyre!("No masks found"))?.wrap_err("Failed to extract mask")?;
eprintln!("Using mask: {mask:x} ({mask:b})");
let create = mask.memfd_create_wrapper_flags();
let buf = {
let mut buf = vec![0; name.as_bytes_with_nul().len()];
println!("Allocated {} bytes for buffer", buf.len());
let mut file: fs::File = {
let mut file = create(name.as_ptr(), super::MEMFD_CREATE_FLAGS)/*unsafe {super::RawFile::from_raw_fd( libc::memfd_create(name.as_ptr(), super::MEMFD_CREATE_FLAGS | mask.mask()) ) };*/.wrap_err(eyre!("Failed to create file"))?;
println!("Created file {file:?}, truncating to length of mask: {}", mask.size());
// XXX: Note: `fallocate()` fails on hugetlb files, but `ftruncate()` does not.
file.truncate_size(mask.size()).wrap_err(eyre!("ftruncate() failed"))?;
println!("Set file-size to {}", buf.len());
file
}.into();
use std::io::{Read, Write, Seek};
println!("Writing {} bytes {:?}...", name.as_bytes_with_nul().len(), name.as_bytes_with_nul());
file.write_all(name.as_bytes_with_nul()).wrap_err(eyre!("Writing failed"))?;
println!("Seeking back to 0...");
file.seek(std::io::SeekFrom::Start(0)).wrap_err(eyre!("Seeking failed"))?;
println!("Reading {} bytes...", buf.len());
file.read_exact(&mut buf[..]).wrap_err(eyre!("Reading failed"))?;
println!("Read {} bytes into: {:?}", buf.len(), buf);
buf
};
assert_eq!(CString::from_vec_with_nul(buf).expect("Invalid contents read"), name);
Ok(())
}
_hugetlb_write_fails().unwrap();
}
}
} }

Loading…
Cancel
Save