ValueError: illegal value in 4-th argument of internal None when running sklearn LinearRegression().fit()
It seems it only happens when you print the figure using matplotlib, else you can run the fit algorithm as many times as you like.
However if you change the data type from float64 to float32 (Grzesik answer), strangely enough the error disappears. Feels like a bug to me Why would changing the data type affect the interaction between matplotlib and the lapack_function within sklearn?
More a question than an answer, but it is a bit scary to find these unexpected interactions across functions and data types.
import numpy as npimport sklearnfrom sklearn.linear_model import LinearRegressionimport matplotlib.pyplot as pltdef main(print_matplotlib=False,dtype=np.float64): x = np.linspace(-3,3,100).astype(dtype) print(x.dtype) y = 2*np.random.rand(x.shape[0])*x + np.random.rand(x.shape[0]) x = x.reshape((-1,1)) reg=LinearRegression().fit(x,y) print(reg.intercept_,reg.coef_) yh = reg.predict(x) if print_matplotlib: plt.scatter(x,y) plt.plot(x,yh) plt.show()
No plotting
if __name__ == "__main__": np.random.seed(64) main(print_matplotlib = False, dtype=np.float64) np.random.seed(64) main(print_matplotlib = False, dtype=np.float64) passfloat640.5957165420019624 [0.91960601]float640.5957165420019624 [0.91960601]
Plotting dtype = np.float64
if __name__ == "__main__": np.random.seed(64) main(print_matplotlib = True, dtype=np.float64) np.random.seed(64) main(print_matplotlib = True, dtype=np.float64) passfloat640.5957165420019624 [0.91960601]
float64---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-3-52593a548324> in <module> 3 main(print_matplotlib = True) 4 np.random.seed(64)----> 5 main(print_matplotlib = True) 6 7 pass<ipython-input-1-11139051f2d3> in main(print_matplotlib, dtype) 11 x = x.reshape((-1,1)) 12 ---> 13 reg=LinearRegression().fit(x,y) 14 print(reg.intercept_,reg.coef_) 15 ~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight) 545 else: 546 self.coef_, self._residues, self.rank_, self.singular_ = \--> 547 linalg.lstsq(X, y) 548 self.coef_ = self.coef_.T 549 ~\AppData\Roaming\Python\Python37\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver) 1249 if info < 0: 1250 raise ValueError('illegal value in %d-th argument of internal %s'-> 1251 % (-info, lapack_driver)) 1252 resids = np.asarray([], dtype=x.dtype) 1253 if m > n:ValueError: illegal value in 4-th argument of internal None
Plotting dtype=np.float32
if __name__ == "__main__": np.random.seed(64) main(print_matplotlib = True, dtype=np.float32) np.random.seed(64) main(print_matplotlib = True, dtype=np.float32) pass
As of numpy 1.19.1 and sklearn v0.23.2, I found that polyfit(deg=1) and LinearRegression().fit() gave unexpected errors without any good reason. No, data didn't have any NaN or Inf value. I eventually used scipy.stats.linregress().
slope, intercept, r_value, p_value, std_err = stats.linregress(x.astype(np.float32), y.astype(np.float32))
First check for nan,inf values. and also try normalize=True
lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
But these didn't work for me. Also, my data didn't have any nan or inf values.But while experimenting, I found that running the same code second time works.hence I did this
try: lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()except: lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
I don't know why this work, but this solved the problem for me.So trying to run the same code twice did the trick for me.