Making use of all available RAM in a Haskell program?
Currently, on Windows, GHC is a 32-bit GHC - I think a 64-bit GHC for windows is supposed to be available when 7.6 comes.
One consequence of that is that on Windows, you can't use more than 4G - 1BLOCK
of memory, since the maximum allowed as a size-parameter is HS_WORD_MAX
:
decodeSize(rts_argv[arg], 2, BLOCK_SIZE, HS_WORD_MAX) / BLOCK_SIZE;
With 32-bit Words, HS_WORD_MAX = 2^32-1
.
That explains
running ./mem.exe 42000000 +RTS -s -M4G errors out with -M4G: size outside allowed range
since decodeSize()
decodes 4G
as 2^32
.
This limitation will remain also after upgrading your GHC, until finally a 64-bit GHC for Windows is released.
As a 32-bit process, the user-mode virtual address space is limited to 2 or 4 GB (depending on the status of the IMAGE_FILE_LARGE_ADDRESS_AWARE
flag), cf Memory limits for Windows Releases.
Now, you are trying to construct a Set
containing 42 million 4-byte Int
s. A Data.Set.Set
has five words of overhead per element (constructor, size, left and right subtree pointer, pointer to element), so the Set
will take up about 0.94 GiB of memory (1.008 'metric' GB). But the process uses about twice that or more (it needs space for the garbage collection, at least the size of the live heap).
Running the programme on my 64-bit linux, with input 21000000 (to make up for the twice as large Int
s and pointers), I get
$ ./mem +RTS -s -RTS 21000000min: 0max: 21000000 31,330,814,200 bytes allocated in the heap 4,708,535,032 bytes copied during GC 1,157,426,280 bytes maximum residency (12 sample(s)) 13,669,312 bytes maximum slop 2261 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 59971 colls, 0 par 2.73s 2.73s 0.0000s 0.0003s Gen 1 12 colls, 0 par 3.31s 10.38s 0.8654s 8.8131s INIT time 0.00s ( 0.00s elapsed) MUT time 12.12s ( 13.33s elapsed) GC time 6.03s ( 13.12s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 18.15s ( 26.45s elapsed) %GC time 33.2% (49.6% elapsed) Alloc rate 2,584,429,494 bytes per MUT second Productivity 66.8% of total user, 45.8% of total elapsed
but top
reports only 1.1g
of memory use - top
, and presumably the Task Manager, reports only live heap.
So it seems IMAGE_FILE_LARGE_ADDRESS_AWARE
is not set, your process is limited to an address space of 2GB, and the 42 million Set
needs more than that - unless you specify a maximum or suggested heap size that is smaller:
$ ./mem +RTS -s -M1800M -RTS 21000000min: 0max: 21000000 31,330,814,200 bytes allocated in the heap 3,551,201,872 bytes copied during GC 1,157,426,280 bytes maximum residency (12 sample(s)) 13,669,312 bytes maximum slop 1154 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 59971 colls, 0 par 2.70s 2.70s 0.0000s 0.0002s Gen 1 12 colls, 0 par 4.23s 4.85s 0.4043s 3.3144s INIT time 0.00s ( 0.00s elapsed) MUT time 11.99s ( 12.00s elapsed) GC time 6.93s ( 7.55s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 18.93s ( 19.56s elapsed) %GC time 36.6% (38.6% elapsed) Alloc rate 2,611,793,025 bytes per MUT second Productivity 63.4% of total user, 61.3% of total elapsed
Setting the maximal heap size below what it would use naturally, actually lets it fit in hardly more than the space needed for the Set
, at the price of a slightly longer GC time, and suggesting a heap size of -H1800M
lets it finish using only
1831 MB total memory in use (0 MB lost due to fragmentation)
So if you specify a maximal heap size below 2GB (but large enough for the Set
to fit), it should work.
The default heap size is unlimited.
Using GHC 7.2 on a 64 bit Windows XP machine, I can allocate higher values, by setting the heap size larger, explicitly:
$ ./A 42000000 +RTS -s -H1.6Gmin: 0max: 42000000 32,590,763,756 bytes allocated in the heap 3,347,044,008 bytes copied during GC 714,186,476 bytes maximum residency (4 sample(s)) 3,285,676 bytes maximum slop 1651 MB total memory in use (0 MB lost due to fragmentation)
and
$ ./A 42000000 +RTS -s -H1.7Gmin: 0max: 42000000 32,590,763,756 bytes allocated in the heap 3,399,477,240 bytes copied during GC 757,603,572 bytes maximum residency (4 sample(s)) 3,281,580 bytes maximum slop 1754 MB total memory in use (0 MB lost due to fragmentation)
even:
$ ./A 42000000 +RTS -s -H1.85Gmin: 0max: 42000000 32,590,763,784 bytes allocated in the heap 3,492,115,128 bytes copied during GC 821,240,344 bytes maximum residency (4 sample(s)) 3,285,676 bytes maximum slop 1909 MB total memory in use (0 MB lost due to fragmentation)
That is, I can allocate up to the Windows XP 2G process limit. I imagine on Win 7 you won't have such a low limit -- this table suggests either 4G or 192G -- just ask for as much as you need (and use a more recent GHC).