How to build semantic search for a given domain How to build semantic search for a given domain python python

How to build semantic search for a given domain


I would highly suggest that you watch Trey Grainger's lecture on how to build a semantic search system => https://www.youtube.com/watch?v=4fMZnunTRF8. He talks about the anatomy of a semantic search system and each of the pieces used to fit together to deliver a final solution.

A great example of the contextual similarity is Bing's search engine: enter image description here

The original query had the terms {canned soda} and bing's search results can refer to {canned diet soda} , {soft drinks}, {unopened room temperature pop} or {carbonated drinks}. How did bing do this?:

Well, words that have similar meanings get similar vectors and then these vectors can be projected onto a 2-dimensional graph to be easily visualized. These vectors are trained by ensuring words with similar meanings are physically near each other. You can train your own vector based model by training the GloVe modelenter image description here

The closer the distances of the vectors are to each other the better. Now you can search for nearest neighbour queries based of the distance of their vectors. For example, for the query {how to stop animals from destroying my garden}, the nearest neighbour gives these results:

enter image description here

You can learn more about it here. For your case you can find a threshold for the maximum distance a vector of a sentence can be from the original search query for it to be consider a contextually similar sentence.

The contextual similarity can also possibly be done by reducing the vocabulary dimension using something like LSI(Latent Semantic Indexing). To do this in Python I would highly suggest you check out the genism library for python: https://radimrehurek.com/gensim/about.html.


You might be interested in looking into Weaviate to help you solve this problem. It is a smart graph based on the vectorization of data objects.

If you have domain-specific language (e.g., abbreviations) you can extend Weaviate with custom concepts.

You might be able to solve your problem with the semantic search features (i.e., Explore{}) or the automatic classification features.

Explore function

Because all data objects get vectorized, you can do a semantic search like the following (this example comes from the docs, you can try it out here using GraphQL):

{  Get{    Things{      Publication(        explore: {          concepts: ["fashion"],          certainty: 0.7,          moveAwayFrom: {            concepts: ["finance"],            force: 0.45          },          moveTo: {            concepts: ["haute couture"],            force: 0.85          }        }      ){        name      }    }  }}

If you structure your graph schema based on -for example- the class name "Sentence", a similar query might look something like this:

{  Get{    Things{      Sentence(        # Explore (i.e., semantically) for "Buying Experience"        explore: {          concepts: ["Buying Experience"]        }        # Result must include the word "car"         where: {          operator: Like          path: ["content"]          valueString: "*car*"        }      ){        content      }    }  }}

Note:
You can also explore the graph semantically as a whole.

Automatic classification

An alternative might be working with the contextual or KNN classification features.

In your case, you might use the class Sentence and relate them to a class called Experience, which would have the property: buying (there are of course many other configurations and strategies you can choose from).

PS:
This video gives a bit more context if you like.


As far as I know, I don’t think any theoretical model exists for building a semantic search engine. However, I believe a semantic search engine should be designed to cater to the specific requirements at hand. Having said that, any semantic search engine that is able to successfully understand the intent of the user as well as the context of the search term, needs to work with natural language processing (NLP) and machine learning as the building blocks.

Even though search engines work differently from search tools, you can refer to enterprise search tools to get an idea about a semantic search model that works. The new age platforms like 3RDi Search work on the principles of semantic search and have proved to be the ideal solution for the unstructured data that enterprises have to deal with. Google is very likely working on a model to introduce advanced semantics in search engine.