Use .corr to get the correlation between two columns
Without actual data it is hard to answer the question but I guess you are looking for something like this:
Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])
That calculates the correlation between your two columns 'Citable docs per Capita'
and 'Energy Supply per Capita'
.
To give an example:
import pandas as pddf = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]}) A B0 0 01 1 22 2 43 3 6
Then
df['A'].corr(df['B'])
gives 1
as expected.
Now, if you change a value, e.g.
df.loc[2, 'B'] = 4.5 A B0 0 0.01 1 2.02 2 4.53 3 6.0
the command
df['A'].corr(df['B'])
returns
0.99586
which is still close to 1, as expected.
If you apply .corr
directly to your dataframe, it will return all pairwise correlations between your columns; that's why you then observe 1s
at the diagonal of your matrix (each column is perfectly correlated with itself).
df.corr()
will therefore return
A BA 1.000000 0.995862B 0.995862 1.000000
In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).
There can be cases, where you get NaN
s in your solution - check this post for an example.
If you want to filter entries above/below a certain threshold, you can check this question.If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.
I ran into the same issue.It appeared Citable Documents per Person
was a float, and python skips it somehow by default. All the other columns of my dataframe were in numpy-formats, so I solved it by converting the columnt to np.float64
Top15['Citable Documents per Person']=np.float64(Top15['Citable Documents per Person'])
Remember it's exactly the column you calculated yourself
My solution would be after converting data to numerical type:
Top15[['Citable docs per Capita','Energy Supply per Capita']].corr()