Pandas Dataframe - Droping Certain Hours of the Day from 20 Years of Historical Data Pandas Dataframe - Droping Certain Hours of the Day from 20 Years of Historical Data numpy numpy

Pandas Dataframe - Droping Certain Hours of the Day from 20 Years of Historical Data


Problem here is how you are importing data. There is no indicator whether 04:00 is am or pm? but based on your comments we need to assume it is PM. However input is showing it as AM.

To solve this we need to include two conditions with OR clause.

  1. 9:30-11:59
  2. 0:00-4:00

Input:

df = pd.DataFrame({'date':   {880551: '2015-07-06 04:00:00', 880552: '2015-07-06 04:02:00',880553: '2015-07-06 04:03:00', 880554: '2015-07-06 04:04:00', 880555: '2015-07-06 04:05:00'},                   'open':   {880551: 125.00, 880552: 125.36,880553: 125.34, 880554: 125.08, 880555: 125.12},                   'high':   {880551: 125.00, 880552: 125.36,880553: 125.34, 880554: 125.11, 880555: 125.12},                   'low':    {880551: 125.00, 880552: 125.32,880553: 125.21, 880554: 125.05, 880555: 125.12},                   'close':  {880551: 125.00, 880552: 125.32,880553: 125.21, 880554: 125.05, 880555: 125.12},                   'volume': {880551: 141, 880552: 200,880553: 750, 880554: 17451, 880555: 1000},                   },                   )df.head()    date    open    high    low close   volume880551  2015-07-06 04:00:00 125.00  125.00  125.00  125.00  141880552  2015-07-06 04:02:00 125.36  125.36  125.32  125.32  200880553  2015-07-06 04:03:00 125.34  125.34  125.21  125.21  750880554  2015-07-06 04:04:00 125.08  125.11  125.05  125.05  17451880555  2015-07-06 04:05:00 125.12  125.12  125.12  125.12  1000from datetime import timestart_first = time(9, 30)end_first = time(11, 59)start_second = time(0, 00)end_second = time(4,00)df['date'] = pd.to_datetime(df['date'])df= df[(df['date'].dt.time.between(start_first, end_first)) | (df['date'].dt.time.between(start_second, end_second))]dfdate    open    high    low close   volume880551  2015-07-06 04:00:00 125.0   125.0   125.0   125.0   141

Above is not good practice, and I strongly discourage to use this kind of ambiguous data. long time solution is to correctly populate data with am/pm.

We can achieve it in two way in case of correct data format:

1) using datetime

from datetime import timestart = time(9, 30)end = time(16)df['date'] = pd.to_datetime(df['date'])df= df[df['date'].dt.time.between(start, end)]

2) using between time, which only works with datetime index

df['date'] = pd.to_datetime(df['date'])df = (df.set_index('date')          .between_time('09:30', '16:00')          .reset_index())

If you still face error, edit your question with line by line approach and exact error.


I think the answer is already in the comments (@Parfait's .between_time) but that it got lost in debugging issues. It appears your df['date'] column is not of type Datetime yet.

This should be enough to fix that and get the required result:

df['date'] = pd.to_datetime(df['date'])df = df.set_index('date')df = df.between_time('9:30', '16:00')


This example code consolidates the answers provided by Bhavesh Ghodasara, Parfait and jorijnsmit into one complete, commented example:

import pandas as pd# example dataframe containing 6 records: 2 days of 3 records each in which all cases are covered:# each day has one record before trading hours, one record during trading hours and one recrod after trading hoursdf = pd.DataFrame({'date':   {0: '2015-07-06 08:00:00', 1: '2015-07-06 13:00:00', 2: '2015-07-06 18:00:00',                               3: '2015-07-07 08:00:00', 4: '2015-07-07 13:00:00', 5: '2015-07-07 18:00:00'},                   'open':   {0: 125.00, 1: 125.36, 2: 125.34, 3: 125.08, 4: 125.12, 5: 125.37},                   'high':   {0: 125.00, 1: 125.36, 2: 125.34, 3: 125.08, 4: 125.12, 5: 125.37},                   'low':    {0: 125.00, 1: 125.36, 2: 125.34, 3: 125.08, 4: 125.12, 5: 125.37},                   'close':  {0: 125.00, 1: 125.36, 2: 125.34, 3: 125.08, 4: 125.12, 5: 125.37},                   'volume': {0: 141, 1: 200, 2: 750, 3: 17451, 4: 1000, 5: 38234},                   },                   )# inspect the example data setdf.head(6)# first, ensure that the 'date' column is of the correct data type: MAKE IT SO!df['date'] = pd.to_datetime(df['date'])# inspect the data types: date column should be of type 'datetime64[ns]'print(df.dtypes)# set the index of the dataframe to the datetime-type column 'data'df = df.set_index('date')# inspect the index: it should be a DatetimeIndex of dtype 'datetime64[ns]'print(df.index)# filter the data setdf_filtered = df.between_time('9:30', '16:00')# inspect the filtered data set: VoilĂ ! No more outside trading hours records.df_filtered.head()