Circular / polar histogram in python Circular / polar histogram in python python python

Circular / polar histogram in python


Building off of this example from the gallery, you can do

enter image description here

import numpy as npimport matplotlib.pyplot as pltN = 80bottom = 8max_height = 4theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False)radii = max_height*np.random.rand(N)width = (2*np.pi) / Nax = plt.subplot(111, polar=True)bars = ax.bar(theta, radii, width=width, bottom=bottom)# Use custom colors and opacityfor r, bar in zip(radii, bars):    bar.set_facecolor(plt.cm.jet(r / 10.))    bar.set_alpha(0.8)plt.show()

Of course, there are many variations and tweeks, but this should get you started.

In general, a browse through the matplotlib gallery is usually a good place to start.

Here, I used the bottom keyword to leave the center empty, because I think I saw an earlier question by you with a graph more like what I have, so I assume that's what you want. To get the full wedges that you show above, just use bottom=0 (or leave it out since 0 is the default).


Quick answer

Use the function circular_hist() I wrote below.

By default this function plots frequency proportional to area, not radius (the reasoning behind this decision is offered below under "longer form answer").

def circular_hist(ax, x, bins=16, density=True, offset=0, gaps=True):    """    Produce a circular histogram of angles on ax.    Parameters    ----------    ax : matplotlib.axes._subplots.PolarAxesSubplot        axis instance created with subplot_kw=dict(projection='polar').    x : array        Angles to plot, expected in units of radians.    bins : int, optional        Defines the number of equal-width bins in the range. The default is 16.    density : bool, optional        If True plot frequency proportional to area. If False plot frequency        proportional to radius. The default is True.    offset : float, optional        Sets the offset for the location of the 0 direction in units of        radians. The default is 0.    gaps : bool, optional        Whether to allow gaps between bins. When gaps = False the bins are        forced to partition the entire [-pi, pi] range. The default is True.    Returns    -------    n : array or list of arrays        The number of values in each bin.    bins : array        The edges of the bins.    patches : `.BarContainer` or list of a single `.Polygon`        Container of individual artists used to create the histogram        or list of such containers if there are multiple input datasets.    """    # Wrap angles to [-pi, pi)    x = (x+np.pi) % (2*np.pi) - np.pi    # Force bins to partition entire circle    if not gaps:        bins = np.linspace(-np.pi, np.pi, num=bins+1)    # Bin data and record counts    n, bins = np.histogram(x, bins=bins)    # Compute width of each bin    widths = np.diff(bins)    # By default plot frequency proportional to area    if density:        # Area to assign each bin        area = n / x.size        # Calculate corresponding bin radius        radius = (area/np.pi) ** .5    # Otherwise plot frequency proportional to radius    else:        radius = n    # Plot data on ax    patches = ax.bar(bins[:-1], radius, zorder=1, align='edge', width=widths,                     edgecolor='C0', fill=False, linewidth=1)    # Set the direction of the zero angle    ax.set_theta_offset(offset)    # Remove ylabels for area plots (they are mostly obstructive)    if density:        ax.set_yticks([])    return n, bins, patches

Example usage:

import matplotlib.pyplot as pltimport numpy as npangles0 = np.random.normal(loc=0, scale=1, size=10000)angles1 = np.random.uniform(0, 2*np.pi, size=1000)# Construct figure and axis to plot onfig, ax = plt.subplots(1, 2, subplot_kw=dict(projection='polar'))# Visualise by area of binscircular_hist(ax[0], angles0)# Visualise by radius of binscircular_hist(ax[1], angles1, offset=np.pi/2, density=False)

example images

Longer form answer

I'd always recommend caution when using circular histograms as they can easily mislead readers.

In particular, I'd advise staying away from circular histograms where frequency and radius are plotted proportionally. I recommend this because the mind is greatly affected by the area of the bins, not just by their radial extent. This is similar to how we're used to interpreting pie charts: by area.

So, instead of using the radial extent of a bin to visualise the number of data points it contains, I'd recommend visualising the number of points by area.

The problem

Consider the consequences of doubling the number of data points in a given histogram bin. In a circular histogram where frequency and radius are proportional, the radius of this bin will increase by a factor of 2 (as the number of points has doubled). However, the area of this bin will have been increased by a factor of 4! This is because the area of the bin is proportional to the radius squared.

If this doesn't sound like too much of a problem yet, let's see it graphically:

frequency histograms

Both of the above plots visualise the same data points.

In the lefthand plot it's easy to see that there are twice as many data points in the (0, pi/4) bin than there are in the (-pi/4, 0) bin.

However, take a look at the right hand plot (frequency proportional to radius). At first glance your mind is greatly affected by the area of the bins. You'd be forgiven for thinking there are more than twice as many points in the (0, pi/4) bin than in the (-pi/4, 0) bin. However, you'd have been misled. It is only on closer inspection of the graphic (and of the radial axis) that you realise there are exactly twice as many data points in the (0, pi/4) bin than in the (-pi/4, 0) bin. Not more than twice as many, as the graph may have originally suggested.

The above graphics can be recreated with the following code:

import numpy as npimport matplotlib.pyplot as pltplt.style.use('seaborn')# Generate data with twice as many points in (0, np.pi/4) than (-np.pi/4, 0)angles = np.hstack([np.random.uniform(0, np.pi/4, size=100),                    np.random.uniform(-np.pi/4, 0, size=50)])bins = 2fig = plt.figure()ax = fig.add_subplot(1, 2, 1)polar_ax = fig.add_subplot(1, 2, 2, projection="polar")# Plot "standard" histogramax.hist(angles, bins=bins)# Fiddle with labels and limitsax.set_xlim([-np.pi/4, np.pi/4])ax.set_xticks([-np.pi/4, 0, np.pi/4])ax.set_xticklabels([r'$-\pi/4$', r'$0$', r'$\pi/4$'])# bin data for our polar histogramcount, bin = np.histogram(angles, bins=bins)# Plot polar histogrampolar_ax.bar(bin[:-1], count, align='edge', color='C0')# Fiddle with labels and limitspolar_ax.set_xticks([0, np.pi/4, 2*np.pi - np.pi/4])polar_ax.set_xticklabels([r'$0$', r'$\pi/4$', r'$-\pi/4$'])polar_ax.set_rlabel_position(90)

A solution

Since we are so greatly affected by the area of the bins in circular histograms, I find it more effective to ensure that the area of each bin is proportional to the number of observations in it, instead of the radius. This is similar to how we are used to interpreting pie charts, where area is the quantity of interest.

Let's use the dataset we used in the previous example to reproduce the graphics based on area, instead of radius:

density histograms

I believe readers have less chance of being misled at first glance of this graphic.

However, when plotting a circular histogram with area proportional to radius we have the disadvantage that you'd never have known that there are exactly twice as many points in the (0, pi/4) bin than in the (-pi/4, 0) bin just by eyeballing the areas. Although, you could counter this by annotating each bin with its corresponding density. I think this disadvantage is preferable to misleading a reader.

Of course I'd ensure that an informative caption was placed alongside this figure to explain that here we visualise frequency with area, not radius.

The above plots were created as:

fig = plt.figure()ax = fig.add_subplot(1, 2, 1)polar_ax = fig.add_subplot(1, 2, 2, projection="polar")# Plot "standard" histogramax.hist(angles, bins=bins, density=True)# Fiddle with labels and limitsax.set_xlim([-np.pi/4, np.pi/4])ax.set_xticks([-np.pi/4, 0, np.pi/4])ax.set_xticklabels([r'$-\pi/4$', r'$0$', r'$\pi/4$'])# bin data for our polar histogramcounts, bin = np.histogram(angles, bins=bins)# Normalise counts to compute areasarea = counts / angles.size# Compute corresponding radii from areasradius = (area / np.pi)**.5polar_ax.bar(bin[:-1], radius, align='edge', color='C0')# Label angles according to conventionpolar_ax.set_xticks([0, np.pi/4, 2*np.pi - np.pi/4])polar_ax.set_xticklabels([r'$0$', r'$\pi/4$', r'$-\pi/4$'])