Getting distinct values using NEST ElasticSearch client

c# .net elasticsearch nest

You are correct that what you want is a terms aggregation. The problem you're running into is that ES is splitting the field "BrandName" in the results it is returning. This is the expected default behavior of a field in ES.

What I recommend is that you change BrandName into a "Multifield", this will allow you to search on all the various parts, as well as doing a terms aggregation on the "Not Analyzed" (aka full "20th Century Fox") term.

Here is the documentation from ES.

https://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html

[UPDATE]If you are using ES version 1.4 or newer the syntax for multi-fields is a little different now.

https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields

Here is a full working sample the illustrate the point in ES 1.4.4. Note the mapping specifies a "not_analyzed" version of the field.

PUT hilden1PUT hilden1/type1/_mapping{  "properties": {    "brandName": {      "type": "string",      "fields": {        "raw": {          "type": "string",          "index": "not_analyzed"        }      }    }  }}POST hilden1/type1{  "brandName": "foo"}POST hilden1/type1{  "brandName": "bar"}POST hilden1/type1{  "brandName": "20th Century Fox"}POST hilden1/type1{  "brandName": "20th Century Fox"}POST hilden1/type1{  "brandName": "foo bar"}GET hilden1/type1/_search{  "size": 0,   "aggs": {    "analyzed_field": {      "terms": {        "field": "brandName",        "size": 10      }    },    "non_analyzed_field": {      "terms": {        "field": "brandName.raw",        "size": 10      }    }      }}

Results of the last query:

{   "took": 3,   "timed_out": false,   "_shards": {      "total": 5,      "successful": 5,      "failed": 0   },   "hits": {      "total": 5,      "max_score": 0,      "hits": []   },   "aggregations": {      "non_analyzed_field": {         "doc_count_error_upper_bound": 0,         "sum_other_doc_count": 0,         "buckets": [            {               "key": "20th Century Fox",               "doc_count": 2            },            {               "key": "bar",               "doc_count": 1            },            {               "key": "foo",               "doc_count": 1            },            {               "key": "foo bar",               "doc_count": 1            }         ]      },      "analyzed_field": {         "doc_count_error_upper_bound": 0,         "sum_other_doc_count": 0,         "buckets": [            {               "key": "20th",               "doc_count": 2            },            {               "key": "bar",               "doc_count": 2            },            {               "key": "century",               "doc_count": 2            },            {               "key": "foo",               "doc_count": 2            },            {               "key": "fox",               "doc_count": 2            }         ]      }   }}

Notice that not-analyzed fields keep "20th century fox" and "foo bar" together where as the analyzed field breaks them up.

c# .net elasticsearch nest

I had a similar issue. I was displaying search results and wanted to show counts on the category and sub category.

You're right to use aggregations. I also had the issue with the strings being tokenised (i.e. 20th century fox being split) - this happens because the fields are analysed. For me, I added the following mappings (i.e. tell ES not to analyse that field):

  "category": {          "type": "nested",          "properties": {            "CategoryNameAndSlug": {              "type": "string",              "index": "not_analyzed"            },            "SubCategoryNameAndSlug": {              "type": "string",              "index": "not_analyzed"            }          }        }

As jhilden suggested, if you use this field for more than one reason (e.g. search and aggregation) you can set it up as a multifield. So on one hand it can get analysed and used for searching and on the other hand for not being analysed for aggregation.

CodeHunter

Getting distinct values using NEST ElasticSearch client

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last