ElasticSearch Order By String Length ElasticSearch Order By String Length elasticsearch elasticsearch

ElasticSearch Order By String Length


You can do the sorting with script-based sorting.

As a toy example, I set up a trivial index with a few documents:

PUT /test_indexPOST /test_index/doc/_bulk{"index":{"_id":1}}{"name":"Bob"}{"index":{"_id":2}}{"name":"Jeff"}{"index":{"_id":3}}{"name":"Darlene"}{"index":{"_id":4}}{"name":"Jose"}

Then I can order search results like this:

POST /test_index/_search{   "query": {      "match_all": {}   },   "sort": {      "_script": {         "script": "doc['name'].value.length()",         "type": "number",         "order": "asc"      }   }}...{   "took": 2,   "timed_out": false,   "_shards": {      "total": 5,      "successful": 5,      "failed": 0   },   "hits": {      "total": 4,      "max_score": null,      "hits": [         {            "_index": "test_index",            "_type": "doc",            "_id": "1",            "_score": null,            "_source": {               "name": "Bob"            },            "sort": [               3            ]         },         {            "_index": "test_index",            "_type": "doc",            "_id": "4",            "_score": null,            "_source": {               "name": "Jose"            },            "sort": [               4            ]         },         {            "_index": "test_index",            "_type": "doc",            "_id": "2",            "_score": null,            "_source": {               "name": "Jeff"            },            "sort": [               4            ]         },         {            "_index": "test_index",            "_type": "doc",            "_id": "3",            "_score": null,            "_source": {               "name": "Darlene"            },            "sort": [               7            ]         }      ]   }}

To filter by length, I can use a script filter in a similar way:

POST /test_index/_search{   "query": {      "filtered": {         "query": {            "match_all": {}         },         "filter": {            "script": {               "script": "doc['name'].value.length() > 3",               "params": {}            }         }      }   },   "sort": {      "_script": {         "script": "doc['name'].value.length()",         "type": "number",         "order": "asc"      }   }}...{   "took": 3,   "timed_out": false,   "_shards": {      "total": 5,      "successful": 5,      "failed": 0   },   "hits": {      "total": 3,      "max_score": null,      "hits": [         {            "_index": "test_index",            "_type": "doc",            "_id": "4",            "_score": null,            "_source": {               "name": "Jose"            },            "sort": [               4            ]         },         {            "_index": "test_index",            "_type": "doc",            "_id": "2",            "_score": null,            "_source": {               "name": "Jeff"            },            "sort": [               4            ]         },         {            "_index": "test_index",            "_type": "doc",            "_id": "3",            "_score": null,            "_source": {               "name": "Darlene"            },            "sort": [               7            ]         }      ]   }}

Here's the code I used:

http://sense.qbox.io/gist/22fef6dc5453eaaae3be5fb7609663cc77c43dab

P.S.: If any of the last names will contain spaces, you might want to use "index": "not_analyzed" on that field.