Add missing dates to pandas dataframe
You could use Series.reindex
:
import pandas as pdidx = pd.date_range('09-01-2013', '09-30-2013')s = pd.Series({'09-02-2013': 2, '09-03-2013': 10, '09-06-2013': 5, '09-07-2013': 1})s.index = pd.DatetimeIndex(s.index)s = s.reindex(idx, fill_value=0)print(s)
yields
2013-09-01 02013-09-02 22013-09-03 102013-09-04 02013-09-05 02013-09-06 52013-09-07 12013-09-08 0...
A quicker workaround is to use .asfreq()
. This doesn't require creation of a new index to call within .reindex()
.
# "broken" (staggered) datesdates = pd.Index([pd.Timestamp('2012-05-01'), pd.Timestamp('2012-05-04'), pd.Timestamp('2012-05-06')])s = pd.Series([1, 2, 3], dates)print(s.asfreq('D'))2012-05-01 1.02012-05-02 NaN2012-05-03 NaN2012-05-04 2.02012-05-05 NaN2012-05-06 3.0Freq: D, dtype: float64
One issue is that reindex
will fail if there are duplicate values. Say we're working with timestamped data, which we want to index by date:
df = pd.DataFrame({ 'timestamps': pd.to_datetime( ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']), 'values':['a','b','c','d']})df.index = pd.DatetimeIndex(df['timestamps']).floor('D')df
yields
timestamps values2016-11-15 "2016-11-15 01:00:00" a2016-11-16 "2016-11-16 02:00:00" b2016-11-16 "2016-11-16 03:00:00" c2016-11-18 "2016-11-18 04:00:00" d
Due to the duplicate 2016-11-16
date, an attempt to reindex:
all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')df.reindex(all_days)
fails with:
...ValueError: cannot reindex from a duplicate axis
(by this it means the index has duplicates, not that it is itself a dup)
Instead, we can use .loc
to look up entries for all dates in range:
df.loc[all_days]
yields
timestamps values2016-11-15 "2016-11-15 01:00:00" a2016-11-16 "2016-11-16 02:00:00" b2016-11-16 "2016-11-16 03:00:00" c2016-11-17 NaN NaN2016-11-18 "2016-11-18 04:00:00" d
fillna
can be used on the column series to fill blanks if needed.