Is it possible to share a HashMap between threads without locking the entire HashMap?
I don't see how your request is possible, at least not without some exceedingly clever lock-free data structures; what should happen if multiple threads need to insert new values that hash to the same location?
In previous work, I've used a RwLock<HashMap<K, Mutex<V>>>
. When inserting a value into the hash, you get an exclusive lock for a short period. The rest of the time, you can have multiple threads with reader locks to the HashMap
and thus to a given element. If they need to mutate the data, they can get exclusive access to the Mutex
.
Here's an example:
use std::{ collections::HashMap, sync::{Arc, Mutex, RwLock}, thread, time::Duration,};fn main() { let data = Arc::new(RwLock::new(HashMap::new())); let threads: Vec<_> = (0..10) .map(|i| { let data = Arc::clone(&data); thread::spawn(move || worker_thread(i, data)) }) .collect(); for t in threads { t.join().expect("Thread panicked"); } println!("{:?}", data);}fn worker_thread(id: u8, data: Arc<RwLock<HashMap<u8, Mutex<i32>>>>) { loop { // Assume that the element already exists let map = data.read().expect("RwLock poisoned"); if let Some(element) = map.get(&id) { let mut element = element.lock().expect("Mutex poisoned"); // Perform our normal work updating a specific element. // The entire HashMap only has a read lock, which // means that other threads can access it. *element += 1; thread::sleep(Duration::from_secs(1)); return; } // If we got this far, the element doesn't exist // Get rid of our read lock and switch to a write lock // You want to minimize the time we hold the writer lock drop(map); let mut map = data.write().expect("RwLock poisoned"); // We use HashMap::entry to handle the case where another thread // inserted the same key while where were unlocked. thread::sleep(Duration::from_millis(50)); map.entry(id).or_insert_with(|| Mutex::new(0)); // Let the loop start us over to try again }}
This takes about 2.7 seconds to run on my machine, even though it starts 10 threads that each wait for 1 second while holding the exclusive lock to the element's data.
This solution isn't without issues, however. When there's a huge amount of contention for that one master lock, getting a write lock can take a while and completely kills parallelism.
In that case, you can switch to a RwLock<HashMap<K, Arc<Mutex<V>>>>
. Once you have a read or write lock, you can then clone the Arc
of the value, returning it and unlocking the hashmap.
The next step up would be to use a crate like arc-swap, which says:
Then one would lock, clone the [
RwLock<Arc<T>>
] and unlock. This suffers from CPU-level contention (on the lock and on the reference count of theArc
) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.The
ArcSwap
can be used instead, which solves the above problems and has better performance characteristics than theRwLock
, both in contended and non-contended scenarios.
I often advocate for performing some kind of smarter algorithm. For example, you could spin up N threads each with their own HashMap
. You then shard work among them. For the simple example above, you could use id % N_THREADS
, for example. There are also complicated sharding schemes that depend on your data.
As Go has done a good job of evangelizing: do not communicate by sharing memory; instead, share memory by communicating.
Maybe you want to consider evmap
:
A lock-free, eventually consistent, concurrent multi-value map.
The trade-off is eventual-consistency: Readers do not see changes until the writer refreshes the map. A refresh is atomic and the writer decides when to do it and expose new data to the readers.
Suppose the key of the data is map-able to a u8
You can have Arc<HashMap<u8,Mutex<HashMap<Key,Value>>>
When you initialize the data structure you populate all the first level map before putting it in Arc (it will be immutable after initialization)
When you want a value from the map you will need to do a double get, something like:
data.get(&map_to_u8(&key)).unwrap().lock().expect("poison").get(&key)
where the unwrap
is safe because we initialized the first map with all the value.
to write in the map something like:
data.get(&map_to_u8(id)).unwrap().lock().expect("poison").entry(id).or_insert_with(|| value);
It's easy to see contention is reduced because we now have 256 Mutex and the probability of multiple threads asking the same Mutex is low.
@Shepmaster example with 100 threads takes about 10s on my machine, the following example takes a little more than 1 second.
use std::{ collections::HashMap, sync::{Arc, Mutex, RwLock}, thread, time::Duration,};fn main() { let mut inner = HashMap::new( ); for i in 0..=u8::max_value() { inner.insert(i, Mutex::new(HashMap::new())); } let data = Arc::new(inner); let threads: Vec<_> = (0..100) .map(|i| { let data = Arc::clone(&data); thread::spawn(move || worker_thread(i, data)) }) .collect(); for t in threads { t.join().expect("Thread panicked"); } println!("{:?}", data);}fn worker_thread(id: u8, data: Arc<HashMap<u8,Mutex<HashMap<u8,Mutex<i32>>>>> ) { loop { // first unwrap is safe to unwrap because we populated for every `u8` if let Some(element) = data.get(&id).unwrap().lock().expect("poison").get(&id) { let mut element = element.lock().expect("Mutex poisoned"); // Perform our normal work updating a specific element. // The entire HashMap only has a read lock, which // means that other threads can access it. *element += 1; thread::sleep(Duration::from_secs(1)); return; } // If we got this far, the element doesn't exist // Get rid of our read lock and switch to a write lock // You want to minimize the time we hold the writer lock // We use HashMap::entry to handle the case where another thread // inserted the same key while where were unlocked. thread::sleep(Duration::from_millis(50)); data.get(&id).unwrap().lock().expect("poison").entry(id).or_insert_with(|| Mutex::new(0)); // Let the loop start us over to try again }}