Simple Python implementation of collaborative topic modeling? Simple Python implementation of collaborative topic modeling? python python

Simple Python implementation of collaborative topic modeling?


This should get you started (although not sure why this hasn't been posted yet): https://github.com/arongdari/python-topic-model

More specifically: https://github.com/arongdari/python-topic-model/blob/master/ptm/collabotm.py

class CollaborativeTopicModel:    """    Wang, Chong, and David M. Blei. "Collaborative topic                                 modeling for recommending scientific articles."    Proceedings of the 17th ACM SIGKDD international conference on Knowledge                                discovery and data mining. ACM, 2011.    Attributes    ----------    n_item: int        number of items    n_user: int        number of users    R: ndarray, shape (n_user, n_item)        user x item rating matrix    """

Looks nice and straightforward. I still suggest at least looking at gensim. Radim has done a fantastic job of optimizing that software very well.


A very simple LDA implementation using gensin. You can find more informations here: https://radimrehurek.com/gensim/tutorial.html

I hope it can help you

from nltk.corpus import stopwordsfrom nltk.tokenize import RegexpTokenizerfrom nltk.stem import RSLPStemmerfrom gensim import corpora, modelsimport gensimst = RSLPStemmer()texts = []doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals"doc2 = "A follower of either the diet or the philosophy is known as a vegan."doc3 = "Distinctions are sometimes made between several categories of veganism."doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs."doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)." docs = [doc1, doc2, doc3, doc4, doc5]for i in docs:    tokens = word_tokenize(i.lower())    stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]    stemmed_tokens = [st.stem(i) for i in stopped_tokens]    texts.append(stemmed_tokens)dictionary = corpora.Dictionary(texts)corpus = [dictionary.doc2bow(text) for text in texts]# generate LDA model using gensim  ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20)print(ldamodel.print_topics(num_topics=2, num_words=4))

[(0, u'0.066*animal + 0.065*, + 0.047*product + 0.028*philosophy'), (1, u'0.085*. + 0.047*product + 0.028*dietary + 0.028*veg')]