ElasticSearch: aggregation on _score field w/ Groovy disabled

groovy elasticsearch scripting

One possible approach is to use the other scripting options available. mvel seems not to be possible to be used unless dynamic scripting is enabled. And, unless a more fine-grained control of scripting enable/disable reaches 1.6 version, I don't think is possible to enable dynamic scripting for mvel and not for groovy.

We are left with native and mustache (used for templates) that are enabled by default. I don't think custom scripting can be done with mustache, if it's possible I didn't find a way and we are left with native (Java) scripting.

Here's my take to this:

create an implementation of NativeScriptFactory:

package com.foo.script;import java.util.Map;import org.elasticsearch.script.ExecutableScript;import org.elasticsearch.script.NativeScriptFactory;public class MyScriptNativeScriptFactory implements NativeScriptFactory {    @Override    public ExecutableScript newScript(Map<String, Object> arg0) {        return new MyScript();    }}

an implementation of AbstractFloatSearchScript for example:

package com.foo.script;import java.io.IOException;import org.elasticsearch.script.AbstractFloatSearchScript;public class MyScript extends AbstractFloatSearchScript {    @Override    public float runAsFloat() {        try {            return score();        } catch (IOException e) {            e.printStackTrace();        }        return 0;    }}

alternatively, build a simple Maven project to tie all together. pom.xml:

<properties>    <elasticsearch.version>1.5.2</elasticsearch.version>    <maven.compiler.source>1.8</maven.compiler.source>    <maven.compiler.target>1.8</maven.compiler.target></properties><dependencies>    <dependency>        <groupId>org.elasticsearch</groupId>        <artifactId>elasticsearch</artifactId>        <version>${elasticsearch.version}</version>        <scope>compile</scope>    </dependency></dependencies><build>    <sourceDirectory>src</sourceDirectory>    <plugins>        <plugin>            <artifactId>maven-compiler-plugin</artifactId>            <version>3.1</version>            <configuration>                <source>1.8</source>                <target>1.8</target>            </configuration>        </plugin>    </plugins></build>

build it and get the resulting jar file.
place the jar inside [ES_folder]/lib
edit elasticsearch.yml and addscript.native.my_script.type: com.foo.script.MyScriptNativeScriptFactory
restart ES nodes.
use it in aggregations:

{  "aggs": {    "max_score": {      "max": {        "script": "my_script",        "lang": "native"      }    }  }}

My sample above just returns the _score as a script but, of course, it can be used in more advanced scenarios.

EDIT: if you are not allowed to touch the instances, then I don't think you have any options.

groovy elasticsearch scripting

ElasticSearch at least of version 1.7.1 and possibly earlier also offers the use of Lucene's Expression scripting language – and as Expression is sandboxed by default it can be used for dynamic inline scripts in much the same way that Groovy was. In our case, where our production ES cluster has just been upgraded from 1.4.1 to 1.7.1, we decided not to use Groovy anymore because of it's non-sandboxed nature, although we really still want to make use of dynamic scripts because of the ease of deployment and the flexibility they offer as we continue to fine-tune our application and its search layer.

While writing a native Java script as a replacement for our dynamic Groovy function scores may have also have been a possibility in our case, we wanted to look at the feasibility of using Expression for our dynamic inline scripting language instead. After reading through the documentation, I found that we were simply able to change the "lang" attribute from "groovy" to "expression" in our inline function_score scripts and with the script.inline: sandbox property set in the .../config/elasticsearch.yml file – the function score script worked without any other modification. As such, we can now continue to use dynamic inline scripting within ElasticSearch, and do so with sandboxing enabled (as Expression is sandboxed by default). Obviously other security measures such as running your ES cluster behind an application proxy and firewall should also be implemented to ensure that outside users have no direct access to your ES nodes or the ES API. However, this was a very simple change, that for now has solved a problem with Groovy's lack of sandboxing and the concerns over enabling it to run without sandboxing.

While switching your dynamic scripts to Expression may only work or be applicable in some cases (depending on the complexity of your inline dynamic scripts), it seemed it was worth sharing this information in the hopes it could help other developers.

As a note, one of the other supported ES scripting languages, Mustache only appears to be usable for creating templates within your search queries. It does not appear to be usable for any of the more complexing scripting needs such as function_score, etc., although I am not sure this was entirely apparent during the first read through of the updated ES documentation.

Lastly, one further issue to be mindful of is that the use of Lucene Expression scripts are marked as an experimental feature in the latest ES release and the documentation notes that as this scripting extension is undergoing significant development work at this time, its usage or functionality may change in later versions of ES. Thus if you do switch over to using Expression for any of your scripts (dynamic or otherwise), it should be noted in your documentation/developer notes to revisit these changes before upgrading your ES installation next time to ensure your scripts remain compatible and work as expected.

For our situation at least, unless we were willing to allow non-sandboxed dynamic scripting to be enabled again in the latest version of ES (via the script.inline: on option) so that inline Groovy scripts could continue to run, switching over to Lucene Expression scripting seemed like the best option for now.

It will be interesting to see what changes occur to the scripting choices for ES in future releases, especially given that the (apparently ineffective) sandboxing option for Groovy will be completely removed by version 2.0. Hopefully other protections can be put in place to enable dynamic Groovy usage, or perhaps Lucene Expression scripting will take Groovy's place and will enable all the types of dynamic scripting that developers are already making use of.

For more notes on Lucene Expression see the ES documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts – this page is also the source of the note regarding the planned removal of Groovy's sandboxing option from ES v2.0+. Further Lucene Expression documentation can be found here: http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html

CodeHunter

ElasticSearch: aggregation on _score field w/ Groovy disabled

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last