Memory-Mapping Slows Down Over Time, Alternatives? Memory-Mapping Slows Down Over Time, Alternatives? unix unix

Memory-Mapping Slows Down Over Time, Alternatives?


HDDs are poor at "serving more than one master" -- the slowdown can be much larger than one might expect. To demonstrate, I used this code to read the backup files (about 50 MB each) on the HDD of my Ubuntu 12.04 machine:

import os, random, timebdir = '/hdd/backup/'fns = os.listdir(bdir)while True:  fn = random.choice(fns)  if not fn.startswith("duplicity-full."):    continue  ts = time.time()  with open(bdir+fn, 'rb') as f:    c = f.read()  print "MB/s: %.1f" %(len(c)/(1000000*(time.time()-ts)))

Running one of these "processes" gives me decent read performance:

MB/s: 148.6MB/s: 169.1MB/s: 184.1MB/s: 188.1MB/s: 185.3MB/s: 146.2

Adding a second such process in parallel slows things down by more than an order of magnitude:

MB/s: 14.3MB/s: 11.6MB/s: 12.7MB/s: 8.7MB/s: 8.2MB/s: 15.9

My guess is this (i.e., other HDD use) is the reason for your inconsistent performance. My hunch is an SSD would do significantly better. For my machine, for large files on the SSD the slowdown due to a parallel reader process was only twofold, from about 440 MB/s to about 220 MB/s. (See my comment.)


You might consider using bcolz . It compresses numerical data on disk and in memory to speed things up. You may have to transpose the matrices in order to get a sparse read since bcolz stores things by column rather than row.