Best way to index arbitrary attribute value pairs on elastic search

elasticsearch elasticsearch-indices

If someone is still looking for an answer, I wrote a post about how to index arbitrary data into Elasticsearch and then to search by specific fields and values. All this, without blowing up your index mapping.

The post: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/

In short, you will need to create special index described in the post. Then you will need to flatten your data using the flattenData function https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4. Then, the flattened data can be safely indexed into Elasticsearch index.

For example:

flattenData({    id: 1,    name: "metamorphosis",    author: "franz kafka"});

Will produce:

[    {        "key": "id",        "type": "long",        "key_type": "id.long",        "value_long": 1    },    {        "key": "name",        "type": "string",        "key_type": "name.string",        "value_string": "metamorphosis"    },    {        "key": "author",        "type": "string",        "key_type": "author.string",        "value_string": "franz kafka"    }]

And

flattenData({    id: 2,    name: "techcorp laptop model x",    type: "computer",    memorygb: 4});

Will produce:

[    {        "key": "id",        "type": "long",        "key_type": "id.long",        "value_long": 2    },    {        "key": "name",        "type": "string",        "key_type": "name.string",        "value_string": "techcorp laptop model x"    },    {        "key": "type",        "type": "string",        "key_type": "type.string",        "value_string": "computer"    },    {        "key": "memorygb",        "type": "long",        "key_type": "memorygb.long",        "value_long": 4    }]

Then you can use build Elasticsearch queries to query your data. Every query should specify both the key and type of value. If you are unsure of what keys or types the index has, you can run an aggregation to find out, this is also discussed in the post.

For example, to find a document where author == "franz kafka" you need to execute the following query:

{    "query": {        "nested": {            "path": "flatData",            "query": {                "bool": {                    "must": [                        {"term": {"flatData.key": "author"}},                        {"match": {"flatData.value_string": "franz kafka"}}                    ]                }            }        }    }}

To find documents where type == "computer" and memorygb > 4 you need to execute the following query:

{    "query": {        "bool": {            "must": [                {                    "nested": {                        "path": "flatData",                        "query": {                            "bool": {                                "must": [                                    {"term": {"flatData.key": "type"}},                                    {"match": {"flatData.value_string": "computer"}}                                ]                            }                        }                    }                },                {                    "nested": {                        "path": "flatData",                        "query": {                            "bool": {                                "must": [                                    {"term": {"flatData.key": "memorygb"}},                                    {"range": {"flatData.value_long": {"gt": 4}}}                                ]                            }                        }                    }                }            ]        }    }}

Here, because we want same document match both conditions, we are using outer bool query with a must clause wrapping two nested queries.

elasticsearch elasticsearch-indices

Elastic Search is a schema-less data store which allows dynamic indexing of new attributes and there is no performance impact in having optional fields. You first mapping is absolutely fine and you can have boolean queries around your dynamic attributes. There is no inherent performance benefit by making them nested fields, they will anyways be flattened on indexing like fields.type , fields.memorygb etc.

On the contrary your last mapping where you try to store as key-value pairs, will have a performance impact, since you will have to query on 2 different indexed fields i.e where key='memorygb' and value =4

Have a look at the documentation about dynamic mapping:

One of the most important features of Elasticsearch is its ability to be schema-less. There is no performance overhead if an object is dynamic, the ability to turn it off is provided as a safety mechanism so "malformed" objects won’t, by mistake, index data that we do not wish to be indexed.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-object-type.html

elasticsearch elasticsearch-indices

you need filtered query look from here :

you have to use together range query with match query

CodeHunter

Best way to index arbitrary attribute value pairs on elastic search

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last