How do I perform ordered selection on multiple Columns by Value How do I perform ordered selection on multiple Columns by Value pandas pandas

How do I perform ordered selection on multiple Columns by Value


Or you could try below if you are looking for rows falls between 2008 to 2013 as you asked in the post "select all rows between September 2013 and May 2008" then use pandas.Series.between:

Dataset borrowed from @jezrael..

DataFrame for Demonstration purpose:

>>> stats_month_census_2   year      month  data0  2008      April     11  2008        May     32  2008       June     43  2013  September     64  2013    October     55  2014   November     66  2014   December     7

Using pandas.Series.between()

>>> stats_month_census_2[stats_month_census_2['year'].between(2008, 2013, inclusive=True)]   year      month  data0  2008      April     11  2008        May     32  2008       June     43  2013  September     64  2013    October     5

If it's just a matter of datetime format, you can simply try below:

>>> stats_month_census_2[stats_month_census_2['year'].between('2008-05', '2013-09', inclusive=True)]        year      month  data1 2008-05-01        May     32 2008-06-01       June     43 2013-09-01  September     6

Using DataFame.query :

>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"')        year      month  data1 2008-05-01        May     32 2008-06-01       June     43 2013-09-01  September     6

Using isin method: Select the rows between two dates

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05-01', '2013-09-01'))]        year      month  data1 2008-05-01        May     32 2008-06-01       June     43 2013-09-01  September     6

Or, even you can pass like below..

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05', '2013-09'))]        year      month  data1 2008-05-01        May     32 2008-06-01       June     43 2013-09-01  September     6

Using loc method by slicing off based on Index start and end dates..

Start = stats_month_census_2[stats_month_census_2['year'] =='2008-05'].index[0]End = stats_month_census_2[stats_month_census_2['year']=='2013-09'].index[0]>>> stats_month_census_2.loc[Start:End]        year      month  data1 2008-05-01        May     32 2008-06-01       June     43 2013-09-01  September     6

Note: Just for the curiosity as @jezrael asked in comment, i'm adding how to convert the year column into datetime format:

As we have the below example DataFrame where we have two distinct columns year and month where year column has only years and month column is in literal string format So, First we need to convert the String into an int form join or add the year & month together by assign a day as 1 for all using pandas pd.to_datetime method.

df   year      month  data0  2008      April     11  2008        May     32  2008       June     43  2013  September     64  2013    October     55  2014   November     66  2014   December     7

Above is the raw DataFrame before datetime conversion So, i'm taking the below approach which i learned over the time vi SO itself.

1- First convert the month names into int form and assign it to a new column called Month as an easy go So, we can use that for conversion later.

df['Month'] = pd.to_datetime(df.month, format='%B').dt.month

2- Secondly, or at last convert Directly the year column into a proper datetime format by directly assigning to year column itself it's a kind of inplace we can say.

df['Date'] = pd.to_datetime(df[['year', 'Month']].assign(Day=1))

Now the Desired DataFrame and year column is in datetime Form:

print(df)        year      month  data  Month0 2008-04-01      April     1      41 2008-05-01        May     3      52 2008-06-01       June     4      63 2013-09-01  September     6      94 2013-10-01    October     5     105 2014-11-01   November     6     116 2014-12-01   December     7     12


You can easily convert the columns into a DateTime column using pd.to_datetime

>>df        month       year0   January     20001   April       20012   July        20023   February    20104   February    20185   March       20146   June        20127   June        20118   May         20099   November    2016>>df['date'] = pd.to_datetime(df['month'].astype(str) + '-' + df['year'].astype(str), format='%B-%Y')>>df        month   year    date0   January     2000    2000-01-011   April       2001    2001-04-012   July        2002    2002-07-013   February    2010    2010-02-014   February    2018    2018-02-015   March       2014    2014-03-016   June        2012    2012-06-017   June        2011    2011-06-018   May         2009    2009-05-019   November    2016    2016-11-01>>df[(df.date <= "2013-09") & (df.date >= "2008-05") ]       month    year    date3   February    2010    2010-02-016   June        2012    2012-06-017   June        2011    2011-06-018   May         2009    2009-05-01


You can create DatetimeIndex and then select by partial string indexing:

stats_month_census_2 = pd.DataFrame({    'year': [2008, 2008, 2008, 2013,2013],    'month': ['April','May','June','September','October'],    'data':[1,3,4,6,5]})print (stats_month_census_2)   year      month  data0  2008      April     11  2008        May     32  2008       June     43  2013  September     64  2013    October     5s = stats_month_census_2.pop('year').astype(str) + stats_month_census_2.pop('month')#if need year and month columns#s = stats_month_census_2['year'].astype(str) + stats_month_census_2['month']stats_month_census_2.index = pd.to_datetime(s, format='%Y%B')print (stats_month_census_2)            data2008-04-01     12008-05-01     32008-06-01     42013-09-01     62013-10-01     5

print (stats_month_census_2['2008':'2013'])            data2008-04-01     12008-05-01     32008-06-01     42013-09-01     62013-10-01     5    print (stats_month_census_2['2008-05':'2013-09'])            data2008-05-01     32008-06-01     42013-09-01     6

Or create column and use between with boolean indexing:

s = stats_month_census_2['year'].astype(str) + stats_month_census_2['month']stats_month_census_2['date'] = pd.to_datetime(s, format='%Y%B')print (stats_month_census_2)   year      month  data       date0  2008      April     1 2008-04-011  2008        May     3 2008-05-012  2008       June     4 2008-06-013  2013  September     6 2013-09-014  2013    October     5 2013-10-01df = stats_month_census_2[stats_month_census_2['date'].between('2008-05', '2013-09')]print (df)   year      month  data       date1  2008        May     3 2008-05-012  2008       June     4 2008-06-013  2013  September     6 2013-09-01

Unfortunately this way with datetime column is not possible for select betwen years, then need pygo solution with year column:

#wrong outputdf = stats_month_census_2[stats_month_census_2['date'].between('2008', '2013')]print (df)   year  month  data       date0  2008  April     1 2008-04-011  2008    May     3 2008-05-012  2008   June     4 2008-06-01