How to print the LDA topics models from gensim? Python
After some messing around, it seems like print_topics(numoftopics)
for the ldamodel
has some bug. So my workaround is to use print_topic(topicid)
:
>>> print lda.print_topics()None>>> for i in range(0, lda.num_topics-1):>>> print lda.print_topic(i)0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system...
I think syntax of show_topics has changed over time:
show_topics(num_topics=10, num_words=10, log=False, formatted=True)
For num_topics number of topics, return num_words most significant words (10 words per topic, by default).
The topics are returned as a list – a list of strings if formatted is True, or a list of (probability, word) 2-tuples if False.
If log is True, also output this result to log.
Unlike LSA, there is no natural ordering between the topics in LDA. The returned num_topics <= self.num_topics subset of all topics is therefore arbitrary and may change between two LDA training runs.
I think it is alway more helpful to see the topics as a list of words. The following code snippet helps acchieve that goal. I assume you already have an lda model called lda_model
.
for index, topic in lda_model.show_topics(formatted=False, num_words= 30): print('Topic: {} \nWords: {}'.format(idx, [w[0] for w in topic]))
In the above code, I have decided to show the first 30 words belonging to each topic. For simplicity, I have shown the first topic I get.
Topic: 0 Words: ['associate', 'incident', 'time', 'task', 'pain', 'amcare', 'work', 'ppe', 'train', 'proper', 'report', 'standard', 'pmv', 'level', 'perform', 'wear', 'date', 'factor', 'overtime', 'location', 'area', 'yes', 'new', 'treatment', 'start', 'stretch', 'assign', 'condition', 'participate', 'environmental']Topic: 1 Words: ['work', 'associate', 'cage', 'aid', 'shift', 'leave', 'area', 'eye', 'incident', 'aider', 'hit', 'pit', 'manager', 'return', 'start', 'continue', 'pick', 'call', 'come', 'right', 'take', 'report', 'lead', 'break', 'paramedic', 'receive', 'get', 'inform', 'room', 'head']
I don't really like the way the above topics look so I usually modify my code to as shown:
for idx, topic in lda_model.show_topics(formatted=False, num_words= 30): print('Topic: {} \nWords: {}'.format(idx, '|'.join([w[0] for w in topic])))
... and the output (first 2 topics shown) will look like.
Topic: 0 Words: associate|incident|time|task|pain|amcare|work|ppe|train|proper|report|standard|pmv|level|perform|wear|date|factor|overtime|location|area|yes|new|treatment|start|stretch|assign|condition|participate|environmentalTopic: 1 Words: work|associate|cage|aid|shift|leave|area|eye|incident|aider|hit|pit|manager|return|start|continue|pick|call|come|right|take|report|lead|break|paramedic|receive|get|inform|room|head