Indexing/Searching "complex" JSON in elasticsearch
The one thing that is certain is that you first need to craft a custom mapping based on your specific data and according to your query needs, my advice is that contains_more
should be of nested
type so that you can issue more precise queries on your fields.
I don't know the exact names of your fields, but based on what you showed, one possible mapping could be something like this.
{ "your_type_name": { "properties": { "foo": { "type": "string" }, "metadata": { "type": "object", "properties": { "some_key": { "type": "string" }, "someotherkey2": { "type": "string" }, "more_data": { "type": "object", "properties": { "contains_more": { "type": "nested", "properties": { "foo": { "type": "string" }, "bar": { "type": "string" }, "baz": { "type": "string" } } } } } } } } }}
Then, as already mentioned by mark in his comment, auto_query
won't cut it, mainly because of the multiple nesting levels. As far as I know, Django/Haystack doesn't support nested queries out of the box, but you can extend Haystack to support it. Here is a blog post that explains how to tackle this: http://www.stamkracht.com/extending-haystacks-elasticsearch-backend. Not sure if this helps, but you should give it a try and let us know if you need more help.
Indexing :
First of all you should use dynamic templates, if you want to define specific mapping relatively to key name, or if your documents do not have the same structure.
But 30 key isn't that high, and you should prefer defining your own mapping than letting Elasticsearch guessing it for you (in case incorrect data have been added first, mapping would be defined according to these data)
Searching:
You can't search for
foz: val5
since "foz" key doesn't exist.
But key "metadata.more_data.even_more.foz" does => all your keys are flatten from the root of your document
this way you'll have to search for
foo: val5metadata.more_data.even_more.foz: 12*metadata.more_data.contains_more.bar: val*metadata.somekey1: val1
Using query_string for example
"query_string": { "default_field": "metadata.more_data.even_more.foz", "query": "12*"}
Or if you want to search in multiple fields
"query_string": { "fields" : ["metadata.more_data.contains_more.bar", "metadata.somekey1"], "query": "val*"}
It took a while to figure out the right solution that works for me
It was a mix of both the provided answers by @juliendangers and @Val and some more customizing.
- I replaced Haystack with the more specific django-simple-elasticsearch
Added custom
get_type_mapping
method to the model@classmethoddef get_type_mapping(cls): return { "properties": { "somekey": { "type": "<specific_type>", "format": "<specific_format>", }, "more_data": { "type": "nested", "include_in_parent": True, "properties": { "even_more": { "type": "nested", "include_in_parent": True, } /* and so on for each level you care about */ } } }
Added custom
get_document
method to the model@classmethoddef get_document(cls, obj): return { 'somekey': obj.somekey, 'more_data': obj.more_data, /* and so on */ }
Add custom Searchform
class Searchform(ElasticsearchForm): q = forms.Charfield(required=False) def get_index(self): return 'your_index' def get_type(self): return 'your_model' def prepare_query(self): if not self.cleaned_data['q']: q = "*" else: q = str(self.cleaned_data['q']) return { "query": { "query_string": { "query": q } } } def search(self): esp = ElasticsearchProcessor(self.es) esp.add_search(self.prepare_query, page=1, page_size=25, index=self.get_index(), doc_type=self.get_type()) responses = esp.search() return responses[0]
So this is what worked for me and covers my usecases. Maybe it can be of some help for someone.