re: Added `RegexMatcher`, a dynamic-dispatch-friendly version of the trait providing methods for execution-only: The value returned by `RegexEngine::try_compile_boxed()` can be consumed into `Box<dyn RegexMatcher + ...>` via an `Into::into()` call if dynamic dispatch over the compiled regular-expression is needed in the future. (TODO: There are also extension methods for `&mut self` & `Arc<Self>` in `RegexMatcher` that currently are unused, but may be useful for PCRE `Regex` if used.)
NOTE: The purpose of this is to allow something like: `let compiled_regex: Box<dyn re::RegexMatcher + Send + Sync + "static> = if cli.use_pcre() { re::Regex::try_compile_boxed(cli.regex)?.into() } else { re::NonPCRERegex::try_compile_boxed(cli.regex)[.inspect_mut(|re| re.prepare_regex())]?.into() };`
Fortune for rematch's current commit: Half blessing − 半吉
[For feature `perl`]: Started re-work of `re` to allow dynamic dispatch for user-selection of if PCRE extensions are enabled.
Fortune for rematch's current commit: Half blessing − 半吉
/// User-provied configuration of how the program should behave here
#[derive(Debug, Args)]
pubstructConfig
{
/// Use the PCRE (JS-like) extended regular expression compiler.
///
/// __NOTE__: The binary must have been compiled with build feature `perl` to use this option.
///
/// # Feature difference
/// By default, the expression syntax does not support things like negative lookahead and other backtrack-requiring regex features.
///
/// ## Efficiency
/// Note that non-PCRE expressions are more efficient in general, and can also enable parallel processing of strings where there are many (e.g. a long list of lines from `stdin` can be matched against in parallel.)
///
/// It is ill-advised to enable PCRE on large inputs unless those features are required.
//TODO: Should we have PCRE on by default or not...? I think we should maybe have it on by default if the feature is enabled... But that will mess with input parallelism... XXX: Perhaps we can auto-detect if to use PCRE or not (e.g. try compiling to regex first, then PCRE if that fails?)
#[arg(short, long)]// XXX: Can we add a clap `value_parser!(FeatureOnBool<"perl">)` which fails to parse its `from_str()` impl if the feature is not enabled. Is this possible with what we currently have? We may be able to with macros, e.g expand a macro to `FeatureOnBool<"perl", const { cfg!(feature="perl") }>` or something similar? (NOTE: If `clap` has a better mechanism for this, use that instead of re-inventing it tho.)
// #[cfg(feature="perl")] //XXX: Do we want this option to be feature-gated? Or should we fail with error `if (! cfg!(feature="perl")) && self.extended)`? I think the latter would make things more easily (since the Regex engine gates PCRE-compilation transparently to the API user [see `crate::re::Regex`], we don't need to gate it this way outside of `re`, if we remove this gate we can just use `cfg!()` everywhere here which makes things **MUCH** cleaner..) It also means the user of a non-PCRE build will at least know why their PCRE flag is failing and that it can be built with the "perl" feature, instead of it being *totally* invisible to the user if the feature is off.
extended: bool,
/// Delimit read input/output strings from/to `stdin`/`stdout` by NUL ('\0') characters instead of newlines.
///
/// This only affects the output of each string's match groups, not the groups themselves, those will still be delimited by TAB literals in the output.
#[arg(short='0', long)]
pubzero: bool,//XXX: Add `--field=`/`--ifs` option, put these in same group. Maybe add `--delimit-groups=` to change the group delimiter from `\t` to user-specified value.
}
implConfig
{
/// Whether it is requested to use PCRE regex instead of regular regex.
///
/// # Interaction with feature gating of ~actual~ PCRE support via `feature="perl"`
/// Note that if the "perl" feature is not enabled, this may still return `true`.
/// If the user requests PCRE where it is not available, the caller should return an error/panic to the user telling her that.
#[inline(always)]
//TODO: Make `extended` public and remove this accessor?
pubfnuse_pcre(&self)-> bool
{
//#![allow(unreachable_code)]
//#[cfg(feature="perl")] return self.extended; //TODO: See above comment on un-gating `self.extended`
//false
self.extended
}
}
/// A string value that may be provided to the CLI, or delegated to `stdio`.
pubtypeMaybeString=MaybeValue<Box<str>>;
/// A path that may represent an `stdio` file-descriptor instead of a named file.
pubtypeMaybePath=MaybeValue<Box<Path>>;
/// `rematch` is a simple command-line tool for matching & printing capture groups of an input string(s) against a regular expression.
///
/// The input string(s) can be provided in the command-line, or they can be provided as line delimited (by default) stream from `stdin`.
//TODO: Allow ranges & fallible captures, so lines that match group 1 but not 2 will not cause output failure if given `1 2?` but will if given `1 2` (XXX: Is this actually meaningful/possible? Can we do this at all? I'm pretty sure `/(?:(.))?/` still creates an (empty) group? So perhaps, syntax for failing on *empty* group matches...? like, `1! 2` for "group #1 *required*, group #2 is not requested?")
groups: Vec<usize>,// TODO: How to dedup (XXX: Do we want to de-dup? Maybe the user wants group `1` twice? I think it's fine (also we need to preserve user ordering of group indecied))
}
implCli{
/// Get the input string to match on
///
/// If the requested input is `stdin`, `None` is returned.
#[inline]
pubfninput_string(&self)-> Option<&str>
{
self.string.value().map(AsRef::as_ref)
}
/// Get the string to build the regular expression from
pubfnregex_string(&self)-> &str
{
&self.regex[..]
}
/// Get the match group(s) to print in the output
#[inline]
pubfngroups(&self)-> &[usize]
{
&self.groups[..]
}
/// Get the number of match groups requested.
#[inline]
pubfnnum_groups(&self)-> usize
{
self.groups.len()
}
}
/// Parse the command-line arguments passed to the program
#![cfg_attr(feature="unstable", feature(impl_trait_in_assoc_type))]// XXX: Re-work `re::RegexEngine` to be able to remove this if we can, so we can use non-allocating `try_exec()` on stable...
modre;
modtext;
modargs;
fnmain()-> Result<(),Box<dynstd::error::Error>>
usecolor_eyre::{
eyre::{
self,
eyre,
WrapErras_,
},
SectionExtas_,Helpas_,
};
fninitialise()-> eyre::Result<()>
{
color_eyre::install()?;
Ok(())
}
fnmain()-> eyre::Result<()>
{
initialise().wrap_err("Fatal: Failed to install panic handle")?;
//TODO: Re-work this to allow non-matched groups (i.e. `Option<Cow<'static, str>>` or something...) to be communicated without `"".into()`.
pubtypeGroups=FrozenVector<FrozenString>;
//TODO: We need to provide a `NonPCRERegex` that we can runtime-polymorphicly use in the case PCRE is disabled/enabled by the user's Cli options (see `args::Config::extended`.)
// This `NonPCRERegex` can be written agnostic to the `perl` feature being enabled, as `Regex` below will use the optionally-included package `pcre` when the feature is enabled, but the `regex` package is *always* available.
//compile_error!("TODO: Remove this trait and refactor this shit. XXX: We don't need all this dynamic dispatch shit, we can just have an `enum` of `regex::Regex` & `Regex` if we need to, dispatching the `exec` call through that; as the compile error type differs & there is no exec error for non-PCRE regex exec. ");
//compile_error!("XXX: TODO: (I don't think we'll even need to do that though, just a helper ext-trait with the same types as the below trait and non-dyn methods -- mostly just `exec() -> Result<Option<Groups>, Self::ExecError>` -- is good enough.)")
pubtraitRegexMatcher
{
/// Attempt to match this regular expression against `string`, and if successful, pass each to callback `result` while `result` returns `Ok(true)`.
///
/// # Callback feeding from match `try_exec()` as an iterator.
/// Once `result(i, n)` -- where `i` is the index of the group returned from the iterator of `try_exec()`, and `n` is the borrowed string of item -- returns a result other than `Ok(true)`, the function will short-circuit in the following way:
///
/// * `Err(e)` - `Err(e.into())` will be returned.
/// * `Ok(false)` - `Ok(Some(()))` will be returned (a *successful* result, despite the rest of the iterator being ignored.)
/// And if the iterator completes before either of the first two are returned from `result`, `Ok(Some(()))` will be returned as well.
///
/// The short-circuit will happen before the callback is invoked at all if `RegexEngine::try_exec()` returns the following:
/// - `Err(e)` will short-circuit to `return Err(e)`.
/// - `Ok(None)` will short-circuit to `return Ok(None)`.
///
/// Note that the case that `Output<'_>` is a lazy iterator works best when working through this dynamic interface.
///
/// # Return
/// The only time `Ok(None)` is returned is if `result` is never executed because the returned value of `try_exec()` is `None`.
/// An empty iterator wrapped in a `Some(_)` will still be returned as `Ok(Some(()))` from this function.
///
/// Any `Err(_)` result will be propagated from this function (from `try_exec()` or any call to `result(i, n)`) to the caller via `Err(e.into())` whenever it may appear.
/// Same as `try_exec_into()`, but can rely on being the *soul owner of* self *while invoked*.
///
/// __NOTE__: The generic implementation of this function does not distinguish ownership, and thus `try_exec_into()` should be preferred unless an explicit owning version has been implemented.
// (__XXX__: Can we impl this for `Regex` when using PCRE to bypass need to lock mutex?)
/// Same as `try_exec_into()`, but can rely on `self` outliving all references within the call.
///
/// Whether `Ok(_)` is returned or not, this `Arc` ref of `self` is consumed after this call.
///
/// __NOTE__: In the generic implementation of this function, If `self` is the only owner of the `Arc<Self>`, it *may* try to dispatch to the owning `try_owned_exec_into()` instead.
/// But **also note that** the generic implementation of `try_owned_exec_into()` defers to `try_exec_into()` anyway.
/// Trait represents a regular-expression object that can be compiled from a string and can match on any number of strings from a shared-reference (possibly in parallel, see below.)
///
/// The output of the match operation is a generic iterator over the match groups that matched (__XXX__: with empty strings denoting non-matches for now to keep the indecies valid. __TODO__: I-it does keep them valid, right??) wrapped in an `Option<_>`, which will return `None` if the string provided does not match the whole regular expression.
/// Should `try_exec()` be ran over an iterator of `string`s in parallel or sequence? Or, does it not matter?
/// Where `num` is the number of `string`s (if known by caller.)
///
/// We assume 0 `string`s will not cause any execution.
///
/// # Returns
/// - `Some(true)` - Yes, do prefer run in parallel.
/// - `Some(false)` - No, do **not** run in parallel if possible.
/// - ~default~ `None` - Unknown. It is possible to run in parallel, but it either does not matter, or may not cause tangible performance benefits over running in sequence.
// SAFETY: The implementation of `Regex::exec()` has no path that can return an error (XXX: Why does it even return `Result` anyway...?)
Ok(unsafe{
Self::exec(&self,string).unwrap_unchecked()
})
}
/// PCRE supports `study()`ing the regular expression, which we might want to do if we have more than a few strings to match on.
///
/// If PCRE is not enabled, and we use the Rust regex `regex::Regex`; it does not require/support additional optimisations, so keep the default noop-impl from the trait if this feature is not enabled.
// XXX: Eh.. The `Arc` means we gotta lock here...
// match (&mut self.internal).get_mut() {
// Ok(v) => v.study(),
// Err(mut v) => v.get_mut().study(),
// };
// NOTE: If there is another lock held while *this* method is being invoked, it can *only* make logical sense that it is calling the same method on a different thread. So do not block to call this. (XXX: This is only required because of the silly locking shit we gotta do here...)
/// Non-PCRE / non-extended regex (regardless of if the `perl` feature is enabled.)
pubtypeNonPCRERegex=regex::Regex;
/// PCRE-enabled (if feature is enabled, see [`IS_EXTENDED`]) regex.
#[derive(Debug, Clone)]
pubstructRegex
{
#[cfg(feature="perl")]
internal: Arc<Mutex<pcre::Pcre>>,
internal: Arc<Mutex<pcre::Pcre>>,// XXX: Can we make parallel usage a bit less... expensive? TODO: How expensive is it to clone these into a thread-local cache, for instance?
#[cfg(not(feature = "perl"))]
internal: regex::Regex,
}
implRegex
{
/// If the implementation uses PCRE instead of default regex.