interactive conditional histogram bucket slicing data visualization

python pandas data-visualization seaborn bokeh

In order to get the interaction effect you're looking for, you must bin all the columns you care about, together.

The cleanest way I can think of doing this is to stack into a single series then use pd.cut

Considering your sample df

df_ = pd.cut(df[['A', 'B']].stack(), 5, labels=list(range(5))).unstack()df_.columns = df_.columns.to_series() + 'bkt'pd.concat([df, df_], axis=1)

Let's build a better example and look at a visualization using seaborn

df = pd.DataFrame(dict(A=(np.random.randn(10000) * 100 + 20).astype(int),                       B=(np.random.randn(10000) * 100 - 20).astype(int)))import seaborn as snsdf.index = df.index.to_series().astype(str).radd('city')df_ = pd.cut(df[['A', 'B']].stack(), 30, labels=list(range(30))).unstack()df_.columns = df_.columns.to_series() + 'bkt'sns.jointplot(x=df_.Abkt, y=df_.Bbkt, kind="scatter", color="k")

Or how about some data with some correlation

mean, cov = [0, 1], [(1, .5), (.5, 1)]data = np.random.multivariate_normal(mean, cov, 100000)df = pd.DataFrame(data, columns=["A", "B"])df.index = df.index.to_series().astype(str).radd('city')df_ = pd.cut(df[['A', 'B']].stack(), 30, labels=list(range(30))).unstack()df_.columns = df_.columns.to_series() + 'bkt'sns.jointplot(x=df_.Abkt, y=df_.Bbkt, kind="scatter", color="k")

Interactive `bokeh`

Without getting too complicated

from bokeh.io import show, output_notebook, output_filefrom bokeh.plotting import figurefrom bokeh.layouts import row, columnfrom bokeh.models import ColumnDataSource, Select, CustomJSoutput_notebook()# generate random dataflips = np.random.choice((1, -1), (5, 5))flips = np.tril(flips, -1) + np.triu(flips, 1) + np.eye(flips.shape[0])half = np.ones((5, 5)) / 2cov = (half + np.diag(np.diag(half))) * flipsmean = np.zeros(5)data = np.random.multivariate_normal(mean, cov, 10000)df = pd.DataFrame(data, columns=list('ABCDE'))df.index = df.index.to_series().astype(str).radd('city')# Stack and cut to get dependent relationshipsb = 20df_ = pd.cut(df.stack(), b, labels=list(range(b))).unstack()# assign default columns x and y.  These will be the columns I set bokeh to readdf_[['x', 'y']] = df_.loc[:, ['A', 'B']]source = ColumnDataSource(data=df_)tools = 'box_select,pan,box_zoom,wheel_zoom,reset,resize,save'p = figure(plot_width=600, plot_height=300)p.circle('x', 'y', source=source, fill_color='olive', line_color='black', alpha=.5)def gcb(like, n):    code = """    var data = source.get('data');    var f = cb_obj.get('value');    data['{0}{1}'] = data[f];    source.trigger('change');    """    return CustomJS(args=dict(source=source), code=code.format(like, n))xcb = CustomJS(    args=dict(source=source),    code="""    var data = source.get('data');    var colm = cb_obj.get('value');    data['x'] = data[colm];    source.trigger('change');    """)ycb = CustomJS(    args=dict(source=source),    code="""    var data = source.get('data');    var colm = cb_obj.get('value');    data['y'] = data[colm];    source.trigger('change');    """)options = list('ABCDE')x_select = Select(options=options, callback=xcb, value='A')y_select = Select(options=options, callback=ycb, value='B')show(column(p, row(x_select, y_select)))

python pandas data-visualization seaborn bokeh

Here is a new solution using bokeh and HoloViews. It should respond a little more to the interactive part.

I try to remember that simple is beautiful when it comes to dataviz.

I used faker library in order to generate random city names to make the following graphs more realistic.

I will let all my codes here even if the most important part is the choice of the libraries.

import pandas as pdimport numpy as npfrom faker import Fakerdef generate_random_dataset(city_number,                         list_identifier,                        labels,                        bins,                        city_location='en_US'):    fake = Faker(locale=city_location)    df = pd.DataFrame(data=np.random.uniform(0, 100, len(list_identifier)]),                       index=[fake.city() for _ in range(city_number)],                       columns=list_identifier)    for name in list_identifier:        df[name + 'bkt'] =  pd.Series(pd.cut(df[name], bins, labels=labels))    return dflist_identifier=list('ABC')labels = ['Low', 'Medium', 'Average', 'Good', 'Great']bins = np.array([-1, 20, 40, 60, 80, 101])df = generate_random_dataset(30, list_identifier, labels, bins)df.head()

will output:

Sometimes, when your dataset is small, exposing a simple chart with colors could be enough.

from bokeh.charts import Bar, output_file, showfrom bokeh.layouts import columnbar = []for name in list_identifier:    bar.append(Bar(df, label='index', values=name, stack=name+'bkt',               title="percentage of " + name, legend='top_left', plot_width=1024))output_file('cities.html')show(column(bar))

Will create a new html page (cities) containing the graphs. Note that all the graphs generated with bokeh are interactive.

bokeh can't initially plot hexbin. However, HoloViews can. Thus, it allows to draw interactive plots whitin ipython notebook.

The syntax is quite straightforward, you just need a Matrix with two columns and call the hist method:

import holoviews as hvhv.notebook_extension('bokeh')df = generate_random_dataset(1000, list_identifier, list(range(5)), 5)points = hv.Points(np.column_stack((df.A, df.B)))points.hist(num_bins=5, dimension=['x', 'y'])

To compare with @piRSquared solution, I stole a bit of code (thank you btw :) to show the data with some correlation:

mean, cov = [0, 1], [(1, .5), (.5, 1)]data = np.random.multivariate_normal(mean, cov, 100000)df = pd.DataFrame(data, columns=["A", "B"])df.index = df.index.to_series().astype(str).radd('city')df_ = pd.cut(df[['A', 'B']].stack(), 30, labels=list(range(30))).unstack()df_.columns = df_.columns.to_series() + 'bkt'points = hv.Points(np.column_stack((df_.Abkt, df_.Bbkt)))points.hist(num_bins=5, dimension=['x', 'y'])

Please consider visit HoloViews tutorial.

python pandas data-visualization seaborn bokeh

As a newbie with insufficient rep, I can't comment, so I'm putting this here as an "answer," though it shouldn't be treated as one; these are just some incomplete suggestions in the same vein as the comments.

Along with the others, I like seaborn though I'm not sure those plots are interactive in the way you are seeking. While I haven't used bokeh, my understanding is that it provides more in the way of interactivity, but regardless of the package, as you move beyond 3 and 4 variables, you can only cram so much into one (family of) charts.

As for in your table directly, the aforementioned df.hist() (by lanery) is a good start. Once you have those bins, you can then play with the immensely powerful df.groupby() function. I've been using pandas for two years now, and that function STILL blows my mind. While not interactive, it will definitely help you slice and dice your data as you see fit.

CodeHunter

interactive conditional histogram bucket slicing data visualization

Interactive `bokeh`

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

interactive conditional histogram bucket slicing data visualization

Interactive bokeh

Recent Posts

Interactive `bokeh`