Numpy hstack - "ValueError: all the input arrays must have same number of dimensions" - but they do Numpy hstack - "ValueError: all the input arrays must have same number of dimensions" - but they do arrays arrays

Numpy hstack - "ValueError: all the input arrays must have same number of dimensions" - but they do


As X is a sparse array, instead of numpy.hstack, use scipy.sparse.hstack to join the arrays. In my opinion the error message is kind of misleading here.

This minimal example illustrates the situation:

import numpy as npfrom scipy import sparseX = sparse.rand(10, 10000)xt = np.random.random((10, 1))print 'X shape:', X.shapeprint 'xt shape:', xt.shapeprint 'Stacked shape:', np.hstack((X,xt)).shape#print 'Stacked shape:', sparse.hstack((X,xt)).shape #This works

Based on the following output

X shape: (10, 10000)xt shape: (10, 1)

one may expect that the hstack in the following line will work, but the fact is that it throws this error:

ValueError: all the input arrays must have same number of dimensions

So, use scipy.sparse.hstack when you have a sparse array to stack.


In fact I have answered this as a comment in your another questions, and you mentioned that another error message pops up:

TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

First of all, AllAlexaAndGoogleInfo does not have a dtype as it is a DataFrame. To get it's underlying numpy array, simply use AllAlexaAndGoogleInfo.values. Check its dtype. Based on the error message, it has a dtype of object, which means that it might contain non-numerical elements like strings.

This is a minimal example that reproduces this situation:

X = sparse.rand(100, 10000)xt = np.random.random((100, 1))xt = xt.astype('object') # Comment this to fix the errorprint 'X:', X.shape, X.dtypeprint 'xt:', xt.shape, xt.dtypeprint 'Stacked shape:', sparse.hstack((X,xt)).shape

The error message:

TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

So, check if there is any non-numerical values in AllAlexaAndGoogleInfo and repair them, before doing the stacking.


Use .column_stack. Like so:

X = np.column_stack((X, AllAlexaAndGoogleInfo))

From the docs:

Take a sequence of 1-D arrays and stack them as columns to make a single 2-D array. 2-D arrays are stacked as-is, just like with hstack.


Try:

X = np.hstack((X, AllAlexaAndGoogleInfo.values))

I don't have a running Pandas module, so can't test it. But the DataFrame documentation describes values Numpy representation of NDFrame. np.hstack is a numpy function, and as such knows nothing about the internal structure of the DataFrame.