Reverse sort and argsort in python
I don't think there's any real need to skip the toarray
. The v
array will be only n_docs
long, which is dwarfed by the size of the n_docs
× n_terms
tf-idf matrix in practical situations. Also, it will be quite dense since any term shared by two documents will give them a non-zero similarity. Sparse matrix representations only pay off when the matrix you're storing is very sparse (I've seen >80% figures for Matlab and assume that Scipy will be similar, though I don't have an exact figure).
The double sort can be skipped by doing
v = v.toarray()vi = np.argsort(v, axis=0)[::-1]vs = v[vi]
Btw., your use of np.inner
on sparse matrices is not going to work with the latest versions of NumPy; the safe way of taking an inner product of two sparse matrices is
v = (tfidf * tfidf[idx, :]).transpose()