You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

74 lines
3.7 KiB

# refset - non-owning `HashSet`
A hash-set analogue that does not own its data.
It can be used to "mark" items without the need to transfer ownership to the map
# Example use case
```rust
/// Process arguments while ignoring duplicates
fn process_args(args: impl IntoIterator<Item=String>) {
let mut same= HashRefSet::new();
for argument in args.into_iter()
{
if !same.insert(argument.as_str()) {
// Already processed this input, ignore
continue;
}
//do work...
}
}
```
# Serialisation support with `serde` crate
`HashRefSet` and `HashType` both implement `Serialize` and `Deserialize` from the `serde` crate if the `serde` feature is enabled. By default it is not.
# Hashing
We use the SHA512 hashing algorithm for the implementation at present.
I may implement the ability to choose different types, but as of now I think it is sufficient.
# Drawbacks
Since the item is not inserted itself, we cannot use `Eq` to double check there was not a hash collision.
While the hashing algorithm used (Sha512) is extremely unlikely to produce collisions, especially for small data types, keep in mind that it is not infallible.
## Speed
`HashRefSet` is significantly slower than `HashSet`, so `HashSet` should be preferred in most cases.
Even when `Clone` is required to insert into `HashSet`, it can be ~10x faster for trivial data structures.
`HashRefSet` should be used if `Clone` is either not an option, or `Clone` is a significantly heavy operation on the type you're inserting.
| Benchmark | Tests | Result |
|--------------------|-----------------------------------------------|-----------------|
| owning_strings | Inserts `String` into `HashSet` by cloning | ~4,538 ns/iter |
| non_owning_strings | Inserts `str` into `HashRefSet` by reference | ~48,271 ns/iter |
| owning_ints | Inserts `u32` into `HashSet` by copy | ~937 ns/iter |
| non_owning_ints | Inserts `&u32` into `HashRefSet` by reference | ~31,089 ns/iter |
## When to use over `HashSet`
* The type you're inserting needs to be both in the set and moved elsewhere. (see exmaple)
* Simply using `Clone` to insert a copy of the item into a `HashSet` is not possible (non-`Clone` type) or is a significantly heavy operation. (see benchmarks)
* The fallibility of potential (albeing extremely unlikely) collisions of the SHA512 algorithm is not a concern
* You need to insert an unsized type into a `HashSet`
## Smallmap implementation
With the `smallmap` feature enabled, the `small` module also provides the same API as `HashRefSet` via `SmallRefMap`.
It is backed by `smallmap::Map` instead of `HashSet`, which could potentially have some performance or memory usage impacts, or not.
The hashing algorithm and usage is otherwise identical for now, but this may change.
### Benchmarks of `SmallRefMap`
Comparing with cloning or copying into `smallmap::Map`.
Largely there are the same performance penalties as the above table, with very minor differences.
| Benchmark | Tests | Result |
|--------------------|-----------------------------------------------|-----------------|
| owning_strings | Inserts `String` into `HashSet` by cloning | ~3,096 ns/iter |
| non_owning_strings | Inserts `str` into `HashRefSet` by reference | ~47,302 ns/iter |
| owning_ints | Inserts `u32` into `HashSet` by copy | ~316 ns/iter |
| non_owning_ints | Inserts `&u32` into `HashRefSet` by reference | ~30,046 ns/iter |
Each page of the `SmallRefMap` will consume at least 16kb of memory however.
This may not be very desireable, but is still an available feature.
# License
MIT