Understanding LDA implementation using gensim

python gensim lda topic-modeling dirichlet

The answer you're looking for is in the gensim tutorial. lda.printTopics(k) prints the most contributing words for k randomly selected topics. One can assume that this is (partially) the distribution of words over each of the given topics, meaning the probability of those words appearing in the topic to the left.

Usually, one would run LDA on a large corpus. Running LDA on a ridiculously small sample won't give the best results.

python gensim lda topic-modeling dirichlet

I think this tutorial will help you understand everything very clearly - https://www.youtube.com/watch?v=DDq3OVp9dNA

I too faced a lot of problems understanding it at first. I'll try to outline a few points in a nutshell.

In Latent Dirichlet Allocation,

The order of words is not important in a document - Bag of Words model.
A document is a distribution over topics
Each topic, in turn, is a distribution over words belonging to the vocabulary
LDA is a probabilistic generative model. It is used to infer hidden variables using a posterior distribution.

Imagine the process of creating a document to be something like this -

Choose a distribution over topics
Draw a topic - and choose word from the topic. Repeat this for each of the topics

LDA is sort of backtracking along this line -given that you have a bag of words representing a document, what could be the topics it is representing ?

So, in your case, the first topic (0)

INFO : topic #0: 0.181*things + 0.181*amazon + 0.181*many + 0.181*sells + 0.031*nokia + 0.031*microsoft + 0.031*apple + 0.031*announces + 0.031*acquisition + 0.031*product

is more about things , amazon and many as they have a higher proportion and not so much about microsoft or apple which have a significantly lower value.

I would suggest reading this blog for a much better understanding ( Edwin Chen is a genius! ) - http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/

python gensim lda topic-modeling dirichlet

Since the above answers were posted, there are now some very nice visualization tools for gaining an intuition of LDA using gensim.

Take a look at the pyLDAvis package. Here is a great notebook overview. And here is a very helpful video description geared toward the end user (9 min tutorial).

Hope this helps!

CodeHunter

Understanding LDA implementation using gensim

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last