LDA model generates different topics everytime i train on the same corpus LDA model generates different topics everytime i train on the same corpus python python

LDA model generates different topics everytime i train on the same corpus


Why does the same LDA parameters and corpus generate different topics everytime?

Because LDA uses randomness in both training and inference steps.

And how do i stabilize the topic generation?

By resetting the numpy.random seed to the same value every time a model is trained or inference is performed, with numpy.random.seed:

SOME_FIXED_SEED = 42# before training/inference:np.random.seed(SOME_FIXED_SEED)

(This is ugly, and it makes Gensim results hard to reproduce; consider submitting a patch. I've already opened an issue.)


Set the random_state parameter in the initialization of LdaModel() method.

lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,                                            id2word=id2word,                                            num_topics=num_topics,                                            random_state=1,                                            passes=num_passes,                                            alpha='auto')


I had the same problem, even with about 50,000 comments. But you can get much more consistent topics by increasing the number of iterations the LDA runs for. It is initially set to 50 and when I raise it to 300, it usually gives me the same results, probably because it is much closer to convergence.

Specifically, you just add the following option:

ldamodel.LdaModel(corpus, ..., iterations = <your desired iterations>):