What is the meaning of "axis" attribute in a Pandas DataFrame?
Data:
In [55]: df1Out[55]: x ya 1 3b 2 4c 3 5d 4 6e 5 7In [56]: df2Out[56]: y zb 1 9c 3 8d 5 7e 7 6f 9 5
Concatenated horizontally (axis=1), using index elements found in both DFs (aligned by indexes for joining):
In [57]: pd.concat([df1, df2], join='inner', axis=1)Out[57]: x y y zb 2 4 1 9c 3 5 3 8d 4 6 5 7e 5 7 7 6
Concatenated vertically (DEFAULT: axis=0), using columns found in both DFs:
In [58]: pd.concat([df1, df2], join='inner')Out[58]: ya 3b 4c 5d 6e 7b 1c 3d 5e 7f 9
If you don't use the inner
join method - you will have it this way:
In [62]: pd.concat([df1, df2])Out[62]: x y za 1.0 3 NaNb 2.0 4 NaNc 3.0 5 NaNd 4.0 6 NaNe 5.0 7 NaNb NaN 1 9.0c NaN 3 8.0d NaN 5 7.0e NaN 7 6.0f NaN 9 5.0In [63]: pd.concat([df1, df2], axis=1)Out[63]: x y y za 1.0 3.0 NaN NaNb 2.0 4.0 1.0 9.0c 3.0 5.0 3.0 8.0d 4.0 6.0 5.0 7.0e 5.0 7.0 7.0 6.0f NaN NaN 9.0 5.0
This is my trick with axis: just add the operation in your mind to make it sound clear:
- axis 0 = rows
- axis 1 = columns
If you “sum” through axis=0, you are summing all rows, and the output will be a single row with the same number of columns.If you “sum” through axis=1, you are summing all columns, and the output will be a single column with the same number of rows.