Linux/perl mmap performance
Ok, found the problem. As suspected, neither linux or perl were to blame. To open and access the file I do something like this:
#!/usr/bin/perl# Create 1 GB file if you do not have one:# dd if=/dev/urandom of=test.bin bs=1048576 count=1000use strict; use warnings;use Sys::Mmap;open (my $fh, "<test.bin") || die "open: $!";my $t = time;print STDERR "mmapping.. ";mmap (my $mh, 0, PROT_READ, MAP_SHARED, $fh) || die "mmap: $!";my $str = unpack ("A1024", substr ($mh, 0, 1024));print STDERR " ", time-$t, " seconds\nsleeping..";sleep (60*60);
If you test that code, there are no delays like those I found in my original code, and after creating the minimal sample (always do that, right!) the reason suddenly became obvious.
The error was that I in my code treated the $mh
scalar as a handle, something which is light weight and can be moved around easily (read: pass by value). Turns out, it's actually a GB long string, definitively not something you want to move around without creating an explicit reference (perl lingua for a "pointer"/handle value). So if you need to store in in a hash or similar, make sure you store \$mh
, and deref it when you need to use it like ${$hash->{mh}}
, typically as the first parameter in a substr or similar.
On 32-bit systems the address space for mmap()
s is rather limited (and varies from OS to OS). Be aware of that if you're using multi-gigabyte files and your are only testing on a 64-bit system. (I would have preferred to write this in a comment but I don't have enough reputation points yet)