How do I perform ordered selection on multiple Columns by Value
Or you could try below if you are looking for rows falls between 2008 to 2013 as you asked in the post "select all rows between September 2013 and May 2008" then use pandas.Series.between:
Dataset borrowed from @jezrael..
DataFrame for Demonstration purpose:
>>> stats_month_census_2 year month data0 2008 April 11 2008 May 32 2008 June 43 2013 September 64 2013 October 55 2014 November 66 2014 December 7
Using pandas.Series.between()
>>> stats_month_census_2[stats_month_census_2['year'].between(2008, 2013, inclusive=True)] year month data0 2008 April 11 2008 May 32 2008 June 43 2013 September 64 2013 October 5
If it's just a matter of datetime
format, you can simply try below:
>>> stats_month_census_2[stats_month_census_2['year'].between('2008-05', '2013-09', inclusive=True)] year month data1 2008-05-01 May 32 2008-06-01 June 43 2013-09-01 September 6
Using DataFame.query :
>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"') year month data1 2008-05-01 May 32 2008-06-01 June 43 2013-09-01 September 6
Using isin method: Select the rows between two dates
>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05-01', '2013-09-01'))] year month data1 2008-05-01 May 32 2008-06-01 June 43 2013-09-01 September 6
Or, even you can pass like below..
>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05', '2013-09'))] year month data1 2008-05-01 May 32 2008-06-01 June 43 2013-09-01 September 6
Using loc
method by slicing off based on Index start and end dates..
Start = stats_month_census_2[stats_month_census_2['year'] =='2008-05'].index[0]End = stats_month_census_2[stats_month_census_2['year']=='2013-09'].index[0]>>> stats_month_census_2.loc[Start:End] year month data1 2008-05-01 May 32 2008-06-01 June 43 2013-09-01 September 6
Note: Just for the curiosity as @jezrael asked in comment, i'm adding how to convert the year
column into datetime format:
As we have the below example DataFrame where we have two distinct columns year
and month
where year column has only years and month column is in literal string format So, First we need to convert the String into an int form join or add the year & month together by assign a day as 1 for all using pandas pd.to_datetime
method.
df year month data0 2008 April 11 2008 May 32 2008 June 43 2013 September 64 2013 October 55 2014 November 66 2014 December 7
Above is the raw DataFrame before datetime conversion So, i'm taking the below approach which i learned over the time vi SO itself.
1- First convert the month
names into int form and assign it to a new column called Month
as an easy go So, we can use that for conversion later.
df['Month'] = pd.to_datetime(df.month, format='%B').dt.month
2- Secondly, or at last convert Directly the year column into a proper datetime
format by directly assigning to year
column itself it's a kind of inplace we can say.
df['Date'] = pd.to_datetime(df[['year', 'Month']].assign(Day=1))
Now the Desired DataFrame and
year
column is in datetime Form:
print(df) year month data Month0 2008-04-01 April 1 41 2008-05-01 May 3 52 2008-06-01 June 4 63 2013-09-01 September 6 94 2013-10-01 October 5 105 2014-11-01 November 6 116 2014-12-01 December 7 12
You can easily convert the columns into a DateTime column using pd.to_datetime
>>df month year0 January 20001 April 20012 July 20023 February 20104 February 20185 March 20146 June 20127 June 20118 May 20099 November 2016>>df['date'] = pd.to_datetime(df['month'].astype(str) + '-' + df['year'].astype(str), format='%B-%Y')>>df month year date0 January 2000 2000-01-011 April 2001 2001-04-012 July 2002 2002-07-013 February 2010 2010-02-014 February 2018 2018-02-015 March 2014 2014-03-016 June 2012 2012-06-017 June 2011 2011-06-018 May 2009 2009-05-019 November 2016 2016-11-01>>df[(df.date <= "2013-09") & (df.date >= "2008-05") ] month year date3 February 2010 2010-02-016 June 2012 2012-06-017 June 2011 2011-06-018 May 2009 2009-05-01
You can create DatetimeIndex
and then select by partial string indexing
:
stats_month_census_2 = pd.DataFrame({ 'year': [2008, 2008, 2008, 2013,2013], 'month': ['April','May','June','September','October'], 'data':[1,3,4,6,5]})print (stats_month_census_2) year month data0 2008 April 11 2008 May 32 2008 June 43 2013 September 64 2013 October 5s = stats_month_census_2.pop('year').astype(str) + stats_month_census_2.pop('month')#if need year and month columns#s = stats_month_census_2['year'].astype(str) + stats_month_census_2['month']stats_month_census_2.index = pd.to_datetime(s, format='%Y%B')print (stats_month_census_2) data2008-04-01 12008-05-01 32008-06-01 42013-09-01 62013-10-01 5
print (stats_month_census_2['2008':'2013']) data2008-04-01 12008-05-01 32008-06-01 42013-09-01 62013-10-01 5 print (stats_month_census_2['2008-05':'2013-09']) data2008-05-01 32008-06-01 42013-09-01 6
Or create column and use between
with boolean indexing
:
s = stats_month_census_2['year'].astype(str) + stats_month_census_2['month']stats_month_census_2['date'] = pd.to_datetime(s, format='%Y%B')print (stats_month_census_2) year month data date0 2008 April 1 2008-04-011 2008 May 3 2008-05-012 2008 June 4 2008-06-013 2013 September 6 2013-09-014 2013 October 5 2013-10-01df = stats_month_census_2[stats_month_census_2['date'].between('2008-05', '2013-09')]print (df) year month data date1 2008 May 3 2008-05-012 2008 June 4 2008-06-013 2013 September 6 2013-09-01
Unfortunately this way with datetime column is not possible for select betwen years, then need pygo
solution with year
column:
#wrong outputdf = stats_month_census_2[stats_month_census_2['date'].between('2008', '2013')]print (df) year month data date0 2008 April 1 2008-04-011 2008 May 3 2008-05-012 2008 June 4 2008-06-01