LDA model generates different topics everytime i train on the same corpus

Why does the same LDA parameters and corpus generate different topics everytime?

Because LDA uses randomness in both training and inference steps.

And how do i stabilize the topic generation?

By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed:

SOME_FIXED_SEED = 42# before training/inference:np.random.seed(SOME_FIXED_SEED)

(This is ugly, and it makes Gensim results hard to reproduce; consider submitting a patch. I've already opened an issue.)

python nlp lda topic-modeling gensim

Set the random_state parameter in the initialization of LdaModel() method.

lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,                                            id2word=id2word,                                            num_topics=num_topics,                                            random_state=1,                                            passes=num_passes,                                            alpha='auto')

python nlp lda topic-modeling gensim

I had the same problem, even with about 50,000 comments. But you can get much more consistent topics by increasing the number of iterations the LDA runs for. It is initially set to 50 and when I raise it to 300, it usually gives me the same results, probably because it is much closer to convergence.

Specifically, you just add the following option:

ldamodel.LdaModel(corpus, ..., iterations = <your desired iterations>):

CodeHunter

LDA model generates different topics everytime i train on the same corpus

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last