You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2.5 KiB

refset - non-owning HsahSet

A hash-set analogue that does not own its data.

It can be used to "mark" items without the need to transfer ownership to the map

Example use case

/// Process arguments while ignoring duplicates
fn process_args(args: impl IntoIterator<Item=String>) {
  let mut same=  HashRefSet::new();
  for argument in args.into_iter()
  {
    if !same.insert(argument.as_str()) {
      // Already processed this input, ignore
      continue;
    }
    //do work...
  }
}

Serialisation support with serde crate

HashRefSet and HashType both implement Serialize and Deserialize from the serde crate if the serde feature is enabled. By default it is not.

Hashing

We use the SHA512 hashing algorithm for the implementation at present. I may implement the ability to choose different types, but as of now I think it is sufficient.

Drawbacks

Since the item is not inserted itself, we cannot use Eq to double check there was not a hash collision. While the hashing algorithm used (Sha512) is extremely unlikely to produce collisions, especially for small data types, keep in mind that it is not infallible.

Speed

HashRefSet is significantly slower than HashSet, so HashSet should be preferred in most cases. Even when Clone is required to insert into HashSet, it can be ~10x faster for trivial data structures. HashRefSet should be used if Clone is either not an option, or Clone is a significantly heavy operation on the type you're inserting.

Benchmark Tests Result
owning_strings Inserts String into HashSet by cloning ~4,538 ns/iter
non_owning_strings Inserts str into HashRefSet by reference ~48,271 ns/iter
owning_ints Inserts u32 into HashSet by copy ~937 ns/iter
non_owning_ints Inserts &u32 into HashRefSet by reference ~31,089 ns/iter

When to use over HashSet

  • The type you're inserting needs to be both in the set and moved elsewhere. (see exmaple)
  • Simply using Clone to insert a copy of the item into a HashSet is not possible (non-Clone type) or is a significantly heavy operation. (see benchmarks)
  • The fallibility of potential (albeing extremely unlikely) collisions of the SHA512 algorithm is not a concern
  • You need to insert an unsized type into a HashSet

License

MIT