Numpy hstack - "ValueError: all the input arrays must have same number of dimensions" - but they do
As X
is a sparse array, instead of numpy.hstack
, use scipy.sparse.hstack
to join the arrays. In my opinion the error message is kind of misleading here.
This minimal example illustrates the situation:
import numpy as npfrom scipy import sparseX = sparse.rand(10, 10000)xt = np.random.random((10, 1))print 'X shape:', X.shapeprint 'xt shape:', xt.shapeprint 'Stacked shape:', np.hstack((X,xt)).shape#print 'Stacked shape:', sparse.hstack((X,xt)).shape #This works
Based on the following output
X shape: (10, 10000)xt shape: (10, 1)
one may expect that the hstack
in the following line will work, but the fact is that it throws this error:
ValueError: all the input arrays must have same number of dimensions
So, use scipy.sparse.hstack
when you have a sparse array to stack.
In fact I have answered this as a comment in your another questions, and you mentioned that another error message pops up:
TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))
First of all, AllAlexaAndGoogleInfo
does not have a dtype
as it is a DataFrame
. To get it's underlying numpy array, simply use AllAlexaAndGoogleInfo.values
. Check its dtype
. Based on the error message, it has a dtype
of object
, which means that it might contain non-numerical elements like strings.
This is a minimal example that reproduces this situation:
X = sparse.rand(100, 10000)xt = np.random.random((100, 1))xt = xt.astype('object') # Comment this to fix the errorprint 'X:', X.shape, X.dtypeprint 'xt:', xt.shape, xt.dtypeprint 'Stacked shape:', sparse.hstack((X,xt)).shape
The error message:
TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))
So, check if there is any non-numerical values in AllAlexaAndGoogleInfo
and repair them, before doing the stacking.
Try:
X = np.hstack((X, AllAlexaAndGoogleInfo.values))
I don't have a running Pandas module, so can't test it. But the DataFrame documentation describes values Numpy representation of NDFrame
. np.hstack
is a numpy
function, and as such knows nothing about the internal structure of the DataFrame
.