Concurrent HashSet<T> in .NET Framework? Concurrent HashSet<T> in .NET Framework? multithreading multithreading

Concurrent HashSet<T> in .NET Framework?


Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.

ConcurrentDictionary (recommended)

This first one is to use the class ConcurrentDictionary<TKey, TValue> in the namespace System.Collections.Concurrent. In the case, the value is pointless, so we can use a simple byte (1 byte in memory).

private ConcurrentDictionary<string, byte> _data;

This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T> except key and value are different objects.

Source: Social MSDN

ConcurrentBag

If you don't mind about the duplicate entries, you can use the class ConcurrentBag<T> in the same namespace of the previous class.

private ConcurrentBag<string> _data;

Self-implementation

Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: How to implement ConcurrentHashSet in .Net

The only drawback of this solution is that the type HashSet<T> doesn't officially concurrent access, even for reading operations.

I quote the code of the linked post (originally written by Ben Mosher).

using System;using System.Collections.Generic;using System.Threading;namespace BlahBlah.Utilities{    public class ConcurrentHashSet<T> : IDisposable    {        private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);        private readonly HashSet<T> _hashSet = new HashSet<T>();        #region Implementation of ICollection<T> ...ish        public bool Add(T item)        {            _lock.EnterWriteLock();            try            {                return _hashSet.Add(item);            }            finally            {                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();            }        }        public void Clear()        {            _lock.EnterWriteLock();            try            {                _hashSet.Clear();            }            finally            {                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();            }        }        public bool Contains(T item)        {            _lock.EnterReadLock();            try            {                return _hashSet.Contains(item);            }            finally            {                if (_lock.IsReadLockHeld) _lock.ExitReadLock();            }        }        public bool Remove(T item)        {            _lock.EnterWriteLock();            try            {                return _hashSet.Remove(item);            }            finally            {                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();            }        }        public int Count        {            get            {                _lock.EnterReadLock();                try                {                    return _hashSet.Count;                }                finally                {                    if (_lock.IsReadLockHeld) _lock.ExitReadLock();                }            }        }        #endregion        #region Dispose        public void Dispose()        {            Dispose(true);            GC.SuppressFinalize(this);        }        protected virtual void Dispose(bool disposing)        {            if (disposing)                if (_lock != null)                    _lock.Dispose();        }        ~ConcurrentHashSet()        {            Dispose(false);        }        #endregion    }}

EDIT: Move the entrance lock methods ouside the try blocks, as they could throw an exception and execute the instructions contained in the finally blocks.


Instead of wrapping a ConcurrentDictionary or locking over a HashSet I created an actual ConcurrentHashSet based on ConcurrentDictionary.

This implementation supports basic operations per item without HashSet's set operations as they make less sense in concurrent scenarios IMO:

var concurrentHashSet = new ConcurrentHashSet<string>(    new[]    {        "hamster",        "HAMster",        "bar",    },    StringComparer.OrdinalIgnoreCase);concurrentHashSet.TryRemove("foo");if (concurrentHashSet.Contains("BAR")){    Console.WriteLine(concurrentHashSet.Count);}

Output: 2

You can get it from NuGet here and see the source on GitHub here.


Since noone else mentioned it, I will offer an alternative approach that may or may not be appropriate for your particular purpose:

Microsoft Immutable Collections

From a blog post by the MS team behind:

While creating and running concurrently is easier than ever, one of the fundamental problems still exists: mutable shared state. Reading from multiple threads is typically very easy, but once the state needs to be updated, it gets a lot harder, especially in designs that require locking.

An alternative to locking is making use of immutable state. Immutable data structures are guaranteed to never change and can thus be passed freely between different threads without worrying about stepping on somebody else’s toes.

This design creates a new problem though: How do you manage changes in state without copying the entire state each time? This is especially tricky when collections are involved.

This is where immutable collections come in.

These collections include ImmutableHashSet<T> and ImmutableList<T>.

Performance

Since the immutable collections use tree data structures underneath to enable structural sharing, their performance characteristics are different from mutable collections. When comparing to a locking mutable collection, the results will depend on lock contention and access patterns. However, taken from another blog post about the immutable collections:

Q: I’ve heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important?

A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.

In other words, in many cases the difference won't be noticeable and you should go with the simpler choice - which for concurrent sets would be to use ImmutableHashSet<T>, since you don't have an existing locking mutable implementation! :-)