How to convert a pandas DataFrame subset of columns AND rows into a numpy array? How to convert a pandas DataFrame subset of columns AND rows into a numpy array? arrays arrays

How to convert a pandas DataFrame subset of columns AND rows into a numpy array?


Use its value directly:

In [79]: df[df.c > 0.5][['b', 'e']].valuesOut[79]: array([[ 0.98836259,  0.82403141],       [ 0.337358  ,  0.02054435],       [ 0.29271728,  0.37813099],       [ 0.70033513,  0.69919695]])


Perhaps something like this for the first problem, you can simply access the columns by their names:

>>> df = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))>>> df[df['c']>.5][['b','e']]          b         e1  0.071146  0.1321452  0.495152  0.420219

For the second problem:

>>> df[df['c']>.5][['b','e']].valuesarray([[ 0.07114556,  0.13214495],       [ 0.49515157,  0.42021946]])


.loc accept row and column selectors simultaneously (as do .ix/.iloc FYI)This is done in a single pass as well.

In [1]: df = DataFrame(np.random.rand(4,5), columns = list('abcde'))In [2]: dfOut[2]:           a         b         c         d         e0  0.669701  0.780497  0.955690  0.451573  0.2321941  0.952762  0.585579  0.890801  0.643251  0.5562202  0.900713  0.790938  0.952628  0.505775  0.5823653  0.994205  0.330560  0.286694  0.125061  0.575153In [5]: df.loc[df['c']>0.5,['a','d']]Out[5]:           a         d0  0.669701  0.4515731  0.952762  0.6432512  0.900713  0.505775

And if you want the values (though this should pass directly to sklearn as is); frames support the array interface

In [6]: df.loc[df['c']>0.5,['a','d']].valuesOut[6]: array([[ 0.66970138,  0.45157274],       [ 0.95276167,  0.64325143],       [ 0.90071271,  0.50577509]])