sorting by a custom list in pandas

python sorting pandas

I just discovered that with pandas 15.1 it is possible to use categorical series (http://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html#categoricals)

As for your example, lets define the same data-frame and sorter:

import pandas as pddata = {    'id': [2967, 5335, 13950, 6141, 6169],    'Player': ['Cedric Hunter', 'Maurice Baker',                'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],    'Year': [1991, 2004, 2001, 2009, 1997],    'Age': [27, 25, 22, 34, 31],    'Tm': ['CHH', 'VAN', 'TOT', 'OKC', 'DAL'],    'G': [6, 7, 60, 52, 81]}# Create DataFramedf = pd.DataFrame(data)# Define the sortersorter = ['TOT', 'ATL', 'BOS', 'BRK', 'CHA', 'CHH', 'CHI', 'CLE', 'DAL', 'DEN',          'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL',          'MIN', 'NJN', 'NOH', 'NOK', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI',          'PHO', 'POR', 'SAC', 'SAS', 'SEA', 'TOR', 'UTA', 'VAN', 'WAS', 'WSB']

With the data-frame and sorter, which is a category-order, we can do the following in pandas 15.1:

# Convert Tm-column to category and in set the sorter as categories hierarchy# Youc could also do both lines in one just appending the cat.set_categories()df.Tm = df.Tm.astype("category")df.Tm.cat.set_categories(sorter, inplace=True)print(df.Tm)Out[48]: 0    CHH1    VAN2    TOT3    OKC4    DALName: Tm, dtype: categoryCategories (38, object): [TOT < ATL < BOS < BRK ... UTA < VAN < WAS < WSB]df.sort_values(["Tm"])  ## 'sort' changed to 'sort_values'Out[49]:    Age   G           Player   Tm  Year     id2   22  60      Ratko Varda  TOT  2001  139500   27   6    Cedric Hunter  CHH  1991   29674   31  81  Adrian Caldwell  DAL  1997   61693   34  52       Ryan Bowen  OKC  2009   61411   25   7    Maurice Baker  VAN  2004   5335

python sorting pandas

Below is an example that performs lexicographic sort on a dataframe.The idea is to create an numerical index based on the specific sort.Then to perform a numerical sort based on the index.A column is added to the dataframe to do so, and is then removed.

import pandas as pd# Create DataFramedf = pd.DataFrame({'id':[2967, 5335, 13950, 6141, 6169],    'Player': ['Cedric Hunter', 'Maurice Baker',               'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],    'Year': [1991, 2004, 2001, 2009, 1997],    'Age': [27, 25, 22, 34, 31],    'Tm': ['CHH' ,'VAN' ,'TOT' ,'OKC', 'DAL'],    'G': [6, 7, 60, 52, 81]})# Define the sortersorter = ['TOT', 'ATL', 'BOS', 'BRK', 'CHA', 'CHH', 'CHI', 'CLE', 'DAL','DEN',          'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL',          'MIN', 'NJN', 'NOH', 'NOK', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI',          'PHO', 'POR', 'SAC', 'SAS', 'SEA', 'TOR', 'UTA', 'VAN',          'WAS', 'WSB']# Create the dictionary that defines the order for sortingsorterIndex = dict(zip(sorter, range(len(sorter))))# Generate a rank column that will be used to sort# the dataframe numericallydf['Tm_Rank'] = df['Tm'].map(sorterIndex)# Here is the result asked with the lexicographic sort# Result may be hard to analyze, so a second sorting is# proposed next## NOTE: ## Newer versions of pandas use 'sort_values' instead of 'sort'df.sort_values(['Player', 'Year', 'Tm_Rank'],        ascending = [True, True, True], inplace = True)df.drop('Tm_Rank', 1, inplace = True)print(df)# Here is an example where 'Tm' is sorted first, that will # give the first row of the DataFrame df to contain TOT as 'Tm'df['Tm_Rank'] = df['Tm'].map(sorterIndex)## NOTE: ## Newer versions of pandas use 'sort_values' instead of 'sort'df.sort_values(['Tm_Rank', 'Player', 'Year'],        ascending = [True , True, True], inplace = True)df.drop('Tm_Rank', 1, inplace = True)print(df)

python sorting pandas

According to pandas 1.1.0 documentation, it has become possible to sort with key parameter like in sorted function (finally!). Here how we can sort by Tm

import pandas as pddata = {    'id': [2967, 5335, 13950, 6141, 6169],    'Player': ['Cedric Hunter', 'Maurice Baker',                'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],    'Year': [1991, 2004, 2001, 2009, 1997],    'Age': [27, 25, 22, 34, 31],    'Tm': ['CHH', 'VAN', 'TOT', 'OKC', 'DAL'],    'G': [6, 7, 60, 52, 81]}# Create DataFramedf = pd.DataFrame(data)def tm_sorter(column):    """Sort function"""    teams = ['TOT', 'ATL', 'BOS', 'BRK', 'CHA', 'CHH', 'CHI', 'CLE', 'DAL', 'DEN',       'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL',       'MIN', 'NJN', 'NOH', 'NOK', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI',       'PHO', 'POR', 'SAC', 'SAS', 'SEA', 'TOR', 'UTA', 'VAN',       'WAS', 'WSB']    correspondence = {team: order for order, team in enumerate(teams)}    return column.map(correspondence)df.sort_values(by='Tm', key=tm_sorter)

Sadly, it looks like we can use this feature only in sorting by 1 column (list with keys is not acceptable). It can be circumvented by groupby

df.sort_values(['Player', 'Year']) \  .groupby(['Player', 'Year']) \  .apply(lambda x: x.sort_values(by='Tm', key=tm_sorter)) \  .reset_index(drop=True)

If you know how to use key in sort_values with multiple columns, tell me please

CodeHunter

sorting by a custom list in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last