Add missing dates to pandas dataframe Add missing dates to pandas dataframe pandas pandas

Add missing dates to pandas dataframe


You could use Series.reindex:

import pandas as pdidx = pd.date_range('09-01-2013', '09-30-2013')s = pd.Series({'09-02-2013': 2,               '09-03-2013': 10,               '09-06-2013': 5,               '09-07-2013': 1})s.index = pd.DatetimeIndex(s.index)s = s.reindex(idx, fill_value=0)print(s)

yields

2013-09-01     02013-09-02     22013-09-03    102013-09-04     02013-09-05     02013-09-06     52013-09-07     12013-09-08     0...


A quicker workaround is to use .asfreq(). This doesn't require creation of a new index to call within .reindex().

# "broken" (staggered) datesdates = pd.Index([pd.Timestamp('2012-05-01'),                   pd.Timestamp('2012-05-04'),                   pd.Timestamp('2012-05-06')])s = pd.Series([1, 2, 3], dates)print(s.asfreq('D'))2012-05-01    1.02012-05-02    NaN2012-05-03    NaN2012-05-04    2.02012-05-05    NaN2012-05-06    3.0Freq: D, dtype: float64


One issue is that reindex will fail if there are duplicate values. Say we're working with timestamped data, which we want to index by date:

df = pd.DataFrame({    'timestamps': pd.to_datetime(        ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),    'values':['a','b','c','d']})df.index = pd.DatetimeIndex(df['timestamps']).floor('D')df

yields

            timestamps             values2016-11-15  "2016-11-15 01:00:00"  a2016-11-16  "2016-11-16 02:00:00"  b2016-11-16  "2016-11-16 03:00:00"  c2016-11-18  "2016-11-18 04:00:00"  d

Due to the duplicate 2016-11-16 date, an attempt to reindex:

all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')df.reindex(all_days)

fails with:

...ValueError: cannot reindex from a duplicate axis

(by this it means the index has duplicates, not that it is itself a dup)

Instead, we can use .loc to look up entries for all dates in range:

df.loc[all_days]

yields

            timestamps             values2016-11-15  "2016-11-15 01:00:00"  a2016-11-16  "2016-11-16 02:00:00"  b2016-11-16  "2016-11-16 03:00:00"  c2016-11-17  NaN                    NaN2016-11-18  "2016-11-18 04:00:00"  d

fillna can be used on the column series to fill blanks if needed.