How can I sort a boxplot in pandas by the median values? How can I sort a boxplot in pandas by the median values? python python

How can I sort a boxplot in pandas by the median values?


You can use the answer in How to sort a boxplot by the median values in pandas but first you need to group your data and create a new data frame:

import pandas as pdimport randomimport matplotlib.pyplot as pltn = 100# this is probably a strange way to generate random data; please feel free to correct itdf = pd.DataFrame({"X": [random.choice(["A","B","C"]) for i in range(n)],                    "Y": [random.choice(["a","b","c"]) for i in range(n)],                   "Z": [random.gauss(0,1) for i in range(n)]})grouped = df.groupby(["X", "Y"])df2 = pd.DataFrame({col:vals['Z'] for col,vals in grouped})meds = df2.median()meds.sort_values(ascending=False, inplace=True)df2 = df2[meds.index]df2.boxplot()plt.show()

plot


Similar answer to Alvaro Fuentes' in function form for more portability

import pandas as pddef boxplot_sorted(df, by, column):  df2 = pd.DataFrame({col:vals[column] for col, vals in df.groupby(by)})  meds = df2.median().sort_values()  df2[meds.index].boxplot(rot=90)boxplot_sorted(df, by=["X", "Y"], column="Z")


To answer the question in the title, without addressing the extra detail of plotting all combinations of two categorical variables:

n = 100df = pd.DataFrame({"Category": [np.random.choice(["A","B","C","D"]) for i in range(n)],                         "Variable": [np.random.normal(0, 10) for i in range(n)]})grouped = df.loc[:,['Category', 'Variable']] \    .groupby(['Category']) \    .median() \    .sort_values(by='Variable')sns.boxplot(x=df.Category, y=df.Variable, order=grouped.index)

enter image description here

I've added this solution because it is hard to reduce the accepted answer to a single variable, and I'm sure people are looking for a way to do that. I myself came to this question multiple time looking for such an answer.