Multi-field, multi-word, match without query_string Multi-field, multi-word, match without query_string elasticsearch elasticsearch

Multi-field, multi-word, match without query_string


What you are looking for is the multi-match query, but it doesn't perform in quite the way you would like.

Compare the output of validate for multi_match vs query_string.

multi_match (with operator and) will make sure that ALL terms exist in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '{   "multi_match" : {      "operator" : "and",      "fields" : [         "firstname",         "lastname"      ],      "query" : "john smith"   }}'# {#    "_shards" : {#       "failed" : 0,#       "successful" : 1,#       "total" : 1#    },#    "explanations" : [#       {#          "index" : "test",#          "explanation" : "((+lastname:john +lastname:smith) | (+firstname:john +firstname:smith))",#          "valid" : true#       }#    ],#    "valid" : true# }

While query_string (with default_operator AND) will check that EACH term exists in at least one field:

curl -XGET 'http://127.0.0.1:9200/_validate/query?pretty=1&explain=true'  -d '{   "query_string" : {      "fields" : [         "firstname",         "lastname"      ],      "query" : "john smith",      "default_operator" : "AND"   }}'# {#    "_shards" : {#       "failed" : 0,#       "successful" : 1,#       "total" : 1#    },#    "explanations" : [#       {#          "index" : "test",#          "explanation" : "+(firstname:john | lastname:john) +(firstname:smith | lastname:smith)",#          "valid" : true#       }#    ],#    "valid" : true# }

So you have a few choices to achieve what you are after:

  1. Preparse the search terms, to remove things like wildcards, etc, before using the query_string

  2. Preparse the search terms to extract each word, then generate a multi_match query per word

  3. Use index_name in your mapping for the name fields to index their data into a single field, which you can then use for search. (like your own custom all field):

As follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '{   "mappings" : {      "test" : {         "properties" : {            "firstname" : {               "index_name" : "name",               "type" : "string"            },            "lastname" : {               "index_name" : "name",               "type" : "string"            }         }      }   }}'curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '{   "firstname" : "john",   "lastname" : "smith"}'curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{   "query" : {      "match" : {         "name" : {            "operator" : "and",            "query" : "john smith"         }      }   }}'# {#    "hits" : {#       "hits" : [#          {#             "_source" : {#                "firstname" : "john",#                "lastname" : "smith"#             },#             "_score" : 0.2712221,#             "_index" : "test",#             "_id" : "VJFU_RWbRNaeHF9wNM8fRA",#             "_type" : "test"#          }#       ],#       "max_score" : 0.2712221,#       "total" : 1#    },#    "timed_out" : false,#    "_shards" : {#       "failed" : 0,#       "successful" : 5,#       "total" : 5#    },#    "took" : 33# }

Note however, that firstname and lastname are no longer searchable independently. The data for both fields has been indexed into name.

You could use multi-fields with the path parameter to make them searchable both independently and together, as follows:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '{   "mappings" : {      "test" : {         "properties" : {            "firstname" : {               "fields" : {                  "firstname" : {                     "type" : "string"                  },                  "any_name" : {                     "type" : "string"                  }               },               "path" : "just_name",               "type" : "multi_field"            },            "lastname" : {               "fields" : {                  "any_name" : {                     "type" : "string"                  },                  "lastname" : {                     "type" : "string"                  }               },               "path" : "just_name",               "type" : "multi_field"            }         }      }   }}'curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '{   "firstname" : "john",   "lastname" : "smith"}'

Searching the any_name field works:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{   "query" : {      "match" : {         "any_name" : {            "operator" : "and",            "query" : "john smith"         }      }   }}'# {#    "hits" : {#       "hits" : [#          {#             "_source" : {#                "firstname" : "john",#                "lastname" : "smith"#             },#             "_score" : 0.2712221,#             "_index" : "test",#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",#             "_type" : "test"#          }#       ],#       "max_score" : 0.2712221,#       "total" : 1#    },#    "timed_out" : false,#    "_shards" : {#       "failed" : 0,#       "successful" : 5,#       "total" : 5#    },#    "took" : 11# }

Searching firstname for john AND smith doesn't work:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{   "query" : {      "match" : {         "firstname" : {            "operator" : "and",            "query" : "john smith"         }      }   }}'# {#    "hits" : {#       "hits" : [],#       "max_score" : null,#       "total" : 0#    },#    "timed_out" : false,#    "_shards" : {#       "failed" : 0,#       "successful" : 5,#       "total" : 5#    },#    "took" : 2# }

But searching firstname for just john works correctly:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{   "query" : {      "match" : {         "firstname" : {            "operator" : "and",            "query" : "john"         }      }   }}'# {#    "hits" : {#       "hits" : [#          {#             "_source" : {#                "firstname" : "john",#                "lastname" : "smith"#             },#             "_score" : 0.30685282,#             "_index" : "test",#             "_id" : "Xf9qqKt0TpCuyLWioNh-iQ",#             "_type" : "test"#          }#       ],#       "max_score" : 0.30685282,#       "total" : 1#    },#    "timed_out" : false,#    "_shards" : {#       "failed" : 0,#       "successful" : 5,#       "total" : 5#    },#    "took" : 3# }


I would rather avoid using query_string in case the user passes "OR", "AND" and any of the other advanced params.

In my experience, escaping the special characters with backslash is a simple and effective solution. The list can be found in the documentation http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description, plus AND/OR/NOT/TO.


Nowadays you can use cross_fields type in multi_match

GET /_validate/query?explain{    "query": {        "multi_match": {            "query":       "peter smith",            "type":        "cross_fields",             "operator":    "and",            "fields":      [ "firstname", "lastname", "middlename" ]        }    }}

Cross-fields take a term-centric approach. It treats all of the fields as one big field, and looks for each term in any field.

One thing to note though is that if you want it to work optimally, all fields analyzed should have the same analyzer (standard, english, etc.):

For the cross_fields query type to work optimally, all fields should have the same analyzer. Fields that share an analyzer are grouped together as blended fields.

If you include fields with a different analysis chain, they will be added to the query in the same way as for best_fields. For instance, if we added the title field to the preceding query (assuming it uses a different analyzer), the explanation would be as follows:

(+title:peter +title:smith) ( +blended("peter", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name]) )