Pandas uses substantially more memory for storage than asked for
So I make a 8000 byte array:
In [248]: x=np.ones(1000)In [249]: df=pd.DataFrame({'MyCol': x}, dtype=float)In [250]: df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 1000 entries, 0 to 999Data columns (total 1 columns):MyCol 1000 non-null float64dtypes: float64(1)memory usage: 15.6 KB
So that 8k for the data, and 8k for the index.
I add a column - usage increases by the size of x
:
In [251]: df['col2']=xIn [252]: df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 1000 entries, 0 to 999Data columns (total 2 columns):MyCol 1000 non-null float64col2 1000 non-null float64dtypes: float64(2)memory usage: 23.4 KBIn [253]: x.nbytesOut[253]: 8000