How to print the LDA topics models from gensim? Python

python nlp lda topic-modeling gensim

After some messing around, it seems like print_topics(numoftopics) for the ldamodel has some bug. So my workaround is to use print_topic(topicid):

>>> print lda.print_topics()None>>> for i in range(0, lda.num_topics-1):>>>  print lda.print_topic(i)0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system...

python nlp lda topic-modeling gensim

I think syntax of show_topics has changed over time:

show_topics(num_topics=10, num_words=10, log=False, formatted=True)

For num_topics number of topics, return num_words most significant words (10 words per topic, by default).

The topics are returned as a list – a list of strings if formatted is True, or a list of (probability, word) 2-tuples if False.

If log is True, also output this result to log.

Unlike LSA, there is no natural ordering between the topics in LDA. The returned num_topics <= self.num_topics subset of all topics is therefore arbitrary and may change between two LDA training runs.

python nlp lda topic-modeling gensim

I think it is alway more helpful to see the topics as a list of words. The following code snippet helps acchieve that goal. I assume you already have an lda model called lda_model.

for index, topic in lda_model.show_topics(formatted=False, num_words= 30):    print('Topic: {} \nWords: {}'.format(idx, [w[0] for w in topic]))

In the above code, I have decided to show the first 30 words belonging to each topic. For simplicity, I have shown the first topic I get.

Topic: 0 Words: ['associate', 'incident', 'time', 'task', 'pain', 'amcare', 'work', 'ppe', 'train', 'proper', 'report', 'standard', 'pmv', 'level', 'perform', 'wear', 'date', 'factor', 'overtime', 'location', 'area', 'yes', 'new', 'treatment', 'start', 'stretch', 'assign', 'condition', 'participate', 'environmental']Topic: 1 Words: ['work', 'associate', 'cage', 'aid', 'shift', 'leave', 'area', 'eye', 'incident', 'aider', 'hit', 'pit', 'manager', 'return', 'start', 'continue', 'pick', 'call', 'come', 'right', 'take', 'report', 'lead', 'break', 'paramedic', 'receive', 'get', 'inform', 'room', 'head']

I don't really like the way the above topics look so I usually modify my code to as shown:

for idx, topic in lda_model.show_topics(formatted=False, num_words= 30):    print('Topic: {} \nWords: {}'.format(idx, '|'.join([w[0] for w in topic])))

... and the output (first 2 topics shown) will look like.

Topic: 0 Words: associate|incident|time|task|pain|amcare|work|ppe|train|proper|report|standard|pmv|level|perform|wear|date|factor|overtime|location|area|yes|new|treatment|start|stretch|assign|condition|participate|environmentalTopic: 1 Words: work|associate|cage|aid|shift|leave|area|eye|incident|aider|hit|pit|manager|return|start|continue|pick|call|come|right|take|report|lead|break|paramedic|receive|get|inform|room|head

CodeHunter

How to print the LDA topics models from gensim? Python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last