Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range) Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range) python python

Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range)


Let's assume your csv looks something like:

c1,c20.000000,0.9680121.000000,2.7126412.000000,11.9588733.000000,10.889784...

I generated the data as such:

import numpy as npfrom sklearn import datasets, linear_modelimport matplotlib.pyplot as pltlength = 10x = np.arange(length, dtype=float).reshape((length, 1))y = x + (np.random.rand(length)*10).reshape((length, 1))

This data is saved to test.csv (just so you know where it came from, obviously you'll use your own).

data = pd.read_csv('test.csv', index_col=False, header=0)x = data.c1.valuesy = data.c2.valuesprint x # prints: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]

You need to take a look at the shape of the data you are feeding into .fit().

Here x.shape = (10,) but we need it to be (10, 1), see sklearn. Same goes for y. So we reshape:

x = x.reshape(length, 1)y = y.reshape(length, 1)

Now we create the regression object and then call fit():

regr = linear_model.LinearRegression()regr.fit(x, y)# plot it as in the example at http://scikit-learn.org/plt.scatter(x, y,  color='black')plt.plot(x, regr.predict(x), color='blue', linewidth=3)plt.xticks(())plt.yticks(())plt.show()

See sklearn linear regression example.enter image description here


Dataset

enter image description here

Importing the libraries

import numpy as npimport matplotlib.pyplot as pltimport pandas as pdfrom sklearn.linear_model import LinearRegression

Importing the dataset

dataset = pd.read_csv('1.csv')X = dataset[["mark1"]]y = dataset[["mark2"]]

Fitting Simple Linear Regression to the set

regressor = LinearRegression()regressor.fit(X, y)

Predicting the set results

y_pred = regressor.predict(X)

Visualising the set results

plt.scatter(X, y, color = 'red')plt.plot(X, regressor.predict(X), color = 'blue')plt.title('mark1 vs mark2')plt.xlabel('mark1')plt.ylabel('mark2')plt.show()

enter image description here


make predictions based on the result?

To predict,

lr = linear_model.LinearRegression().fit(X,Y)lr.predict(X)

Is there any way I can view details of the regression?

The LinearRegression has coef_ and intercept_ attributes.

lr.coef_lr.intercept_

show the slope and intercept.