Using NumPy to convert user/item ratings into 2-D array Using NumPy to convert user/item ratings into 2-D array pandas pandas

Using NumPy to convert user/item ratings into 2-D array


This is pivot, if I get your idea right, with pandas it will be as follows.

Load data:

import pandas as pddf = pd.read_csv(fname, sep='\s+', header=None)df.columns = ['User','Item','ItemRating']

Pivot it:

>>> df   User  Item  ItemRating0     1    23           31     2   204           42     1   492           23     3    23           4>>> df.pivot(index='User', columns='Item', values='ItemRating')Item  23   204  492User1       3  NaN    22     NaN    4  NaN3       4  NaN  NaN

For a numpy example, let's emulate file with StringIO:

from StringIO import StringIOdata ="""1     23    32     204   41     492   23     23    4"""

and load it:

>>> arr = np.genfromtxt(StringIO(data), dtype=int)>>> arrarray([[  1,  23,  3],       [  2, 204,  4],       [  1, 492,  2],       [  3,  23,  4]])

pivot is based on this answer

rows, row_pos = np.unique(arr[:, 0], return_inverse=True)cols, col_pos = np.unique(arr[:, 1], return_inverse=True)rows, row_pos = np.unique(arr[:, 0], return_inverse=True)cols, col_pos = np.unique(arr[:, 1], return_inverse=True)pivot_table = np.zeros((len(rows), len(cols)), dtype=arr.dtype)pivot_table[row_pos, col_pos] = arr[:, 2]

and the result:

>>> pivot_tablearray([[ 3,  0,  2],       [ 0,  4,  0],       [ 4,  0,  0]])

Note that results differ, as in second approach non-existing values are set to zero.

Select one that suits you better ;)