Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range)
Let's assume your csv looks something like:
c1,c20.000000,0.9680121.000000,2.7126412.000000,11.9588733.000000,10.889784...
I generated the data as such:
import numpy as npfrom sklearn import datasets, linear_modelimport matplotlib.pyplot as pltlength = 10x = np.arange(length, dtype=float).reshape((length, 1))y = x + (np.random.rand(length)*10).reshape((length, 1))
This data is saved to test.csv (just so you know where it came from, obviously you'll use your own).
data = pd.read_csv('test.csv', index_col=False, header=0)x = data.c1.valuesy = data.c2.valuesprint x # prints: [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
You need to take a look at the shape of the data you are feeding into .fit()
.
Here x.shape = (10,)
but we need it to be (10, 1)
, see sklearn. Same goes for y
. So we reshape:
x = x.reshape(length, 1)y = y.reshape(length, 1)
Now we create the regression object and then call fit()
:
regr = linear_model.LinearRegression()regr.fit(x, y)# plot it as in the example at http://scikit-learn.org/plt.scatter(x, y, color='black')plt.plot(x, regr.predict(x), color='blue', linewidth=3)plt.xticks(())plt.yticks(())plt.show()
See sklearn linear regression example.
Dataset
Importing the libraries
import numpy as npimport matplotlib.pyplot as pltimport pandas as pdfrom sklearn.linear_model import LinearRegression
Importing the dataset
dataset = pd.read_csv('1.csv')X = dataset[["mark1"]]y = dataset[["mark2"]]
Fitting Simple Linear Regression to the set
regressor = LinearRegression()regressor.fit(X, y)
Predicting the set results
y_pred = regressor.predict(X)
Visualising the set results
plt.scatter(X, y, color = 'red')plt.plot(X, regressor.predict(X), color = 'blue')plt.title('mark1 vs mark2')plt.xlabel('mark1')plt.ylabel('mark2')plt.show()
make predictions based on the result?
To predict,
lr = linear_model.LinearRegression().fit(X,Y)lr.predict(X)
Is there any way I can view details of the regression?
The LinearRegression has coef_
and intercept_
attributes.
lr.coef_lr.intercept_
show the slope and intercept.