Why is Pandas Concatenation (pandas.concat) so Memory Inefficient?
Looks like you are trying to row-wise concat, even though you text indicates that you what column-wise. Specify axis=1
.
Other points to consider:
copy=False
flag will not help at all; this only matters if you are not concatting blocks of the same dtype (which you indicated you are).
pd.concat
does use np.concatenate
under the hood. If you think you can do better, then go for it.
def make_frames(n=100, rows=100, cols=100): return [ pd.DataFrame(np.random.randn(rows,cols),columns=np.random.choice(110,100,replace=False)) for i in xrange(n) ]In [28]: l = make_frames(rows=10000)In [29]: l[0].head()Out[29]: 60 75 101 103 87 29 10 106 71 26 30 83 2 28 99 85 88 62 58 18 42 1 105 25 34 ... 102 27 22 \0 -0.854117 -0.007549 -0.510359 -0.993757 0.877635 -0.303199 -1.488548 1.179360 0.578095 0.807792 0.169930 -1.781403 0.204696 -0.515057 -0.954246 1.106073 0.666516 -1.146988 1.335709 0.362838 -0.675379 1.483469 0.670385 -0.483312 -0.703795 ... 1.322645 -1.942183 1.053502 1 2.057542 0.860946 -0.037665 -0.347265 0.152562 -0.859537 1.431045 1.306419 0.623013 1.192325 0.909597 1.710507 1.319330 -0.402874 1.749581 1.223489 0.036354 0.140255 0.844330 -0.091447 -0.347245 0.259055 1.187882 -0.216858 -1.421336 ... 1.122068 0.887538 0.205854 2 -0.077974 0.947503 0.688666 0.288104 -1.275329 -0.840847 -2.014090 -1.318507 -0.889416 -0.098005 0.055492 0.847597 -1.289428 -0.910093 0.201312 -1.699879 0.103062 -1.041608 0.379171 -1.089937 0.894626 -1.500215 -0.501182 0.042078 -0.840789 ... 0.539192 0.193256 0.196138 3 0.291993 1.138577 1.061509 0.856553 1.118931 0.725806 -0.689776 1.337957 -1.009835 -0.976506 -0.392317 0.295876 0.092240 0.418201 0.473585 0.013809 -1.169947 0.424797 0.019051 -0.526189 0.066991 -0.268750 1.277004 -0.736560 -0.314987 ... 0.272045 -0.333272 0.573267 4 -2.073985 -0.016950 -1.712770 0.286212 -0.159693 -0.495864 1.286450 -1.168880 1.031456 -3.080568 1.443880 -0.604405 0.406383 -0.162986 1.077255 1.160726 0.943949 -1.517681 -1.049972 1.208850 -0.859617 -0.145358 -0.638898 0.248012 -2.985845 ... -0.699697 0.051352 0.575304 69 76 91 45 14 37 0 81 38 72 107 11 5 73 70 8 90 94 53 3 55 12 0 -0.972965 -0.298674 1.283482 2.344092 -0.597735 -0.407978 0.971726 -0.935620 0.236889 -0.957096 -2.366399 -0.943760 0.293325 -0.240385 -0.392554 -0.887556 0.261402 -2.050122 -1.776865 -1.513899 -0.953916 0.630495 1 -1.471033 0.269830 -0.744507 -0.982779 0.624527 -1.782704 1.197262 -0.297730 1.122939 -1.039226 0.171351 -0.828985 0.698245 0.563430 0.718177 0.682369 1.415918 0.049931 0.648000 1.785455 -0.190021 -1.329753 2 -1.942792 0.560981 -0.353782 -1.637407 -1.495131 -0.593041 -1.617116 -0.910257 -0.506877 0.178378 -0.623986 0.302544 0.279309 -0.266409 0.780306 0.986510 -1.549847 0.063632 -0.480434 1.393221 -1.237682 1.577320 3 0.468151 -1.002872 -0.147329 -0.420609 0.183696 0.527632 0.018911 -2.059989 1.642613 -0.428345 1.350693 -1.323321 -0.247263 0.331525 -2.036862 -2.593575 0.362101 -0.184095 0.419231 -0.633878 0.097499 -0.026044 4 -0.581330 -0.848421 -0.682027 -1.260004 -0.357354 -0.304743 0.409537 -1.189925 -0.609352 -0.610345 -0.798009 0.219822 -0.681764 1.872736 1.738017 0.439148 1.012881 -0.934613 -1.007427 -0.390359 0.329949 0.486906 [5 rows x 100 columns]
Concat, note using axis=1
as this is column-wise concat.
In [31]: df = pd.concat(l,axis=1,ignore_index=True)In [32]: df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 10000 entries, 0 to 9999Columns: 10000 entries, 0 to 9999dtypes: float64(10000)memory usage: 763.0 MB
Timings
In [33]: %timeit pd.concat(l,axis=1,ignore_index=True)1 loops, best of 3: 1.15 s per loopIn [34]: %memit pd.concat(l,axis=1,ignore_index=True)peak memory: 2390.25 MiB, increment: 651.28 MiB