Using Sklearn's TfidfVectorizer transform Using Sklearn's TfidfVectorizer transform python python

Using Sklearn's TfidfVectorizer transform


If you want to compute tf-idf only for a given vocabulary, use vocabulary argument to TfidfVectorizer constructor,

vocabulary = "a list of words I want to look for in the documents".split()vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word',            stop_words='english', vocabulary=vocabulary)

Then, to fit, i.e. calculate counts, with a given corpus, i.e. an iterable of documents, use fit:

vect.fit(corpus)

Method fit_transform is a shortening for

vect.fit(corpus)corpus_tf_idf = vect.transform(corpus) 

Last, transform method accepts a corpus, so for a single document, you should pass it as list, or it is treated as iterable of symbols, each symbol being a document.

doc_tfidf = vect.transform([doc])