Profiling multithreading performance in a Haskell program — no speedups using parallel strategies
Sorry that I couldn't provide code in a timely manner to assist respondents. It took me a while to untangle the exact location of the issue.
The problem was as follows: I was fmapping a function
f :: a -> S b
over the traversable data structure
structure :: T a
where S and T are two traversable functors.
Then, when using parTraversable, I was mistakenly writing
Compose (fmap f structure) `using` parTraversable rdeepseq
instead of
Compose $ fmap f structure `using` parTraversable rdeepseq
so I was wrongly using the Traversable instance for Compose T S to do the multithreading (using Data.Functor.Compose).
(This looks like it should've been easy to catch, but it took me a while to extract the above mistake from the code!)
This now looks much better!