search functionality on multi-language django site search functionality on multi-language django site django django

search functionality on multi-language django site


This more of a starting point than a full solution, but I hope it help and that other userscan improve this idea and reach a better solution.

Using Haystack to index a multilingual site (using django-transmeta or django-multilingual) you face two problems:

  1. how to index the content for all thelanguages
  2. how to search the querythe correct index depending on theselected languages

1) Index the content for all the languages

Create a separate fields for each language in every SearchIndex model, using a common prefixand the language code:

text_en = indexes.CharField(model_attr='body_en', document=True)text_pt = indexes.CharField(model_attr='body_pt')

If you want to index several fields you can obviously use a template. Only one of the indexes can have document=True.

If you need pre-rendered http://haystacksearch.org/docs/searchindex_api.html field for faster display, you should create one for each language (ie, rendered_en, rendered_pt)

2) Querying the correct index

The default haystack auto_query method is programmed to receive a "q" query parameter on the requestand search the "content" index field - the one marked as document=True - in all the Index models. Only one of the indexes can have document=True and I believe we can only have a SearchIndex for each django Model.

The simplest solution, using the common search form, is to create a Multilingual SearchQuerySetthat filters based, not on content, but on text_ (text being the prefix used onthe Searchindex model above)

from django.conf import settingsfrom django.utils.translation import get_languagefrom haystack.query import SearchQuerySet, DEFAULT_OPERATORclass MlSearchQuerySet(SearchQuerySet):    def filter(self, **kwargs):        """Narrows the search based on certain attributes and the default operator."""        if 'content' in kwargs:            kwd = kwargs.pop('content')            kwdkey = "text_%s" % str(get_language())            kwargs[kwdkey] = kwd        if getattr(settings, 'HAYSTACK_DEFAULT_OPERATOR', DEFAULT_OPERATOR) == 'OR':           return self.filter_or(**kwargs)        else:            return self.filter_and(**kwargs)

and point your search URL to a view that uses this query set:

from haystack.forms import ModelSearchFormfrom haystack.views import SearchViewurlpatterns += patterns('haystack.views',    url(r'^search/$', SearchView(        searchqueryset=MlSearchQuerySet(),        form_class=ModelSearchForm    ), name='haystack_search_ml'),)

Now your search should be aware of the selected language.


I wrote a detailed explanation about how-to do it here: http://anthony-tresontani.github.com/Django/2012/09/20/multilingual-search/

That implies writing a custom solr engine (backend + query) and settings multiple cores by languages.


There are few commercial products - for example multilingual indexer for Solr or Lucene capable of determining the language automatically.

I don't like commercial products but the idea is nice and simple - crawl the website, determine the language (with meta tag for example) and index.

So choose the search engine and try to extend it to handle multilingual sites.

Good question though, let us know how you solved this.