Python: Semantic similarity score for Strings [duplicate] Python: Semantic similarity score for Strings [duplicate] python python

Python: Semantic similarity score for Strings [duplicate]


The best package I've seen for this is Gensim, found at the Gensim Homepage. I've used it many times, and overall been very happy with it's ease of use; it is written in Python, and has an easy to follow tutorial to get you started, which compares 9 strings. It can be installed via pip, so you won't have a lot of hassle getting it installed I hope.

Which scoring algorithm you use depends heavily on the context of your problem, but I'd suggest starting of with the LSI functionality if you want something basic. (That's what the tutorial walks you through.)

If you go through the tutorial for gensim, it will walk you through comparing two strings, using the Similarities function. This will allow you to see how your stings compare to each other, or to some other sting, on the basis of the text they contain.

If you're interested in the science behind how it works, check out this paper.


Unfortunately, I cannot help you with PY but you may take a look at my old project that uses dictionaries to accomplish the Semantic comparisons between the sentences (which can later be coded in PY implementing the vector-space analysis). It should be just a few hrs of coding to translate from JAVA to PY.https://sourceforge.net/projects/semantics/