Creating a histogram for the data in Python Creating a histogram for the data in Python numpy numpy

Creating a histogram for the data in Python


It looks like the only thing missing in your code was that (unlike the leading bins which are half-open) the last bin in the numpy histogram is closed (includes both endpoints), whereas all of your bins were half-open. (Source, see "Notes")

If a bin is defined by it's edges, binmin and binmax, a value x is assigned to that bin if:

For the first n-1 bins: binmin <= x < binmax

For the last bin: binmin <= x <= binmax

Similarly, np.arange() also expects a half-open interval, so in the code that follows I used np.linspace().

Consider the following:

import numpy as npdef histogram_using_numpy(filename, bins=10):    datas =  np.loadtxt(filename, delimiter=" ", usecols=(0,))    hist, bin_edges = np.histogram(datas, bins)    return hist, bin_edgesdef histogram_using_list(filename, bins=10, take_col=0):    f = open(filename,"r")    data = []    for item in f.readlines():        data.append(float(item.split()[take_col]))    f.close()    mi,ma = min(data), max(data)    def get_count(lis,binmin,binmax,inclusive_endpoint=False):        count = 0        for item in lis:            if item >= binmin and item < binmax:                count += 1            elif inclusive_endpoint and item == binmax:                count += 1        return count    bin_edges = np.linspace(mi, ma, bins+1)    tot = []    binlims = zip(bin_edges[0:-1], bin_edges[1:])    for i,(binmin,binmax) in enumerate(binlims):        inclusive = (i == (len(binlims) - 1))        tot.append(get_count(data, binmin, binmax, inclusive))    return tot, bin_edgesnump_hist, nump_bin_edges = histogram_using_numpy("ex.txt", bins=15)func_hist, func_bin_edges = histogram_using_list("ex.txt", bins=15)print "Histogram:"print "  From numpy:      %s" % list(nump_hist)print "  From my function %s" % list(func_hist)print ""print "Bin Edges:"print "  From numpy:      %s" % nump_bin_edgesprint "  From my function %s" % func_bin_edges

Which, for bins=10, outputs:

Histogram:  From numpy:      [10, 19, 20, 28, 15, 16, 14, 11, 5, 12]  From my function [10, 19, 20, 28, 15, 16, 14, 11, 5, 12]Bin Edges:  From numpy:      [ 4.3   4.66  5.02  5.38  5.74  6.1   6.46  6.82  7.18  7.54  7.9 ]  From my function [ 4.3   4.66  5.02  5.38  5.74  6.1   6.46  6.82  7.18  7.54  7.9 ]

And for bins=15, outputs:

Histogram:  From numpy:      [7, 4, 18, 19, 5, 24, 8, 10, 13, 6, 13, 6, 5, 1, 11]  From my function [7, 4, 18, 19, 5, 24, 8, 10, 13, 6, 13, 6, 5, 1, 11]Bin Edges:  From numpy:      [ 4.3   4.54  4.78  5.02  5.26  5.5   5.74  5.98  6.22  6.46  6.7   6.94  7.18  7.42  7.66  7.9 ]  From my function [ 4.3   4.54  4.78  5.02  5.26  5.5   5.74  5.98  6.22  6.46  6.7   6.94  7.18  7.42  7.66  7.9 ]