Using NumPy to convert user/item ratings into 2-D array
This is pivot, if I get your idea right, with pandas it will be as follows.
Load data:
import pandas as pddf = pd.read_csv(fname, sep='\s+', header=None)df.columns = ['User','Item','ItemRating']
Pivot it:
>>> df User Item ItemRating0 1 23 31 2 204 42 1 492 23 3 23 4>>> df.pivot(index='User', columns='Item', values='ItemRating')Item 23 204 492User1 3 NaN 22 NaN 4 NaN3 4 NaN NaN
For a numpy example, let's emulate file with StringIO
:
from StringIO import StringIOdata ="""1 23 32 204 41 492 23 23 4"""
and load it:
>>> arr = np.genfromtxt(StringIO(data), dtype=int)>>> arrarray([[ 1, 23, 3], [ 2, 204, 4], [ 1, 492, 2], [ 3, 23, 4]])
pivot is based on this answer
rows, row_pos = np.unique(arr[:, 0], return_inverse=True)cols, col_pos = np.unique(arr[:, 1], return_inverse=True)rows, row_pos = np.unique(arr[:, 0], return_inverse=True)cols, col_pos = np.unique(arr[:, 1], return_inverse=True)pivot_table = np.zeros((len(rows), len(cols)), dtype=arr.dtype)pivot_table[row_pos, col_pos] = arr[:, 2]
and the result:
>>> pivot_tablearray([[ 3, 0, 2], [ 0, 4, 0], [ 4, 0, 0]])
Note that results differ, as in second approach non-existing values are set to zero.
Select one that suits you better ;)