NumPy: calculate averages with NaNs removed NumPy: calculate averages with NaNs removed numpy numpy

NumPy: calculate averages with NaNs removed


I think what you want is a masked array:

dat = np.array([[1,2,3], [4,5,nan], [nan,6,nan], [nan,nan,nan]])mdat = np.ma.masked_array(dat,np.isnan(dat))mm = np.mean(mdat,axis=1)print mm.filled(np.nan) # the desired answer

Edit: Combining all of the timing data

   from timeit import Timer    setupstr="""import numpy as npfrom scipy.stats.stats import nanmean    dat = np.random.normal(size=(1000,1000))ii = np.ix_(np.random.randint(0,99,size=50),np.random.randint(0,99,size=50))dat[ii] = np.nan"""      method1="""mdat = np.ma.masked_array(dat,np.isnan(dat))mm = np.mean(mdat,axis=1)mm.filled(np.nan)    """    N = 2    t1 = Timer(method1, setupstr).timeit(N)    t2 = Timer("[np.mean([l for l in d if not np.isnan(l)]) for d in dat]", setupstr).timeit(N)    t3 = Timer("np.array([r[np.isfinite(r)].mean() for r in dat])", setupstr).timeit(N)    t4 = Timer("np.ma.masked_invalid(dat).mean(axis=1)", setupstr).timeit(N)    t5 = Timer("nanmean(dat,axis=1)", setupstr).timeit(N)    print 'Time: %f\tRatio: %f' % (t1,t1/t1 )    print 'Time: %f\tRatio: %f' % (t2,t2/t1 )    print 'Time: %f\tRatio: %f' % (t3,t3/t1 )    print 'Time: %f\tRatio: %f' % (t4,t4/t1 )    print 'Time: %f\tRatio: %f' % (t5,t5/t1 )

Returns:

Time: 0.045454  Ratio: 1.000000Time: 8.179479  Ratio: 179.950595Time: 0.060988  Ratio: 1.341755Time: 0.070955  Ratio: 1.561029Time: 0.065152  Ratio: 1.433364


If performance matters, you should use bottleneck.nanmean() instead:

http://pypi.python.org/pypi/Bottleneck