Pandas version of rbind
Ah, this is to do with how I created the DataFrame, not with how I was combining them. The long and the short of it is, if you are creating a frame using a loop and a statement that looks like this:
Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData))
You must ignore the index
Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData), ignore_index=True)
Or you will have issues later when combining data.
pd.concat
will serve the purpose of rbind
in R.
import pandas as pddf1 = pd.DataFrame({'col1': [1,2], 'col2':[3,4]})df2 = pd.DataFrame({'col1': [5,6], 'col2':[7,8]})print(df1)print(df2)print(pd.concat([df1, df2]))
The outcome will looks like:
col1 col20 1 31 2 4 col1 col20 5 71 6 8 col1 col20 1 31 2 40 5 71 6 8
If you read the documentation careful enough, it will also explain other operations like cbind, ..etc.
This worked for me:
import numpy as npimport pandas as pddates = np.asarray(pd.date_range('1/1/2000', periods=8))df1 = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])df2 = df1.copy()df = df1.append(df2)
Yields:
A B C D2000-01-01 -0.327208 0.552500 0.862529 0.4931092000-01-02 1.039844 -2.141089 -0.781609 1.3076002000-01-03 -0.462831 0.066505 -1.698346 1.1231742000-01-04 -0.321971 -0.544599 -0.486099 -0.2837912000-01-05 0.693749 0.544329 -1.606851 0.5277332000-01-06 -2.461177 -0.339378 -0.236275 0.1555692000-01-07 -0.597156 0.904511 0.369865 0.8625042000-01-08 -0.958300 -0.583621 -2.068273 0.5394342000-01-01 -0.327208 0.552500 0.862529 0.4931092000-01-02 1.039844 -2.141089 -0.781609 1.3076002000-01-03 -0.462831 0.066505 -1.698346 1.1231742000-01-04 -0.321971 -0.544599 -0.486099 -0.2837912000-01-05 0.693749 0.544329 -1.606851 0.5277332000-01-06 -2.461177 -0.339378 -0.236275 0.1555692000-01-07 -0.597156 0.904511 0.369865 0.8625042000-01-08 -0.958300 -0.583621 -2.068273 0.539434
If you don't already use the latest version of pandas
I highly recommend upgrading. It is now possible to operate with DataFrames which contain duplicate indices.