Elasticsearch + Apache Spark performance

elasticsearch apache-spark apache-spark-sql

I figured out what was going on, basically, I was trying to manipulate the dataframe schema because I have some fields with a dot e.g user.firstname.This seems to cause a problem in the collect phase of spark. To resolve this, I had to just re-index my data so my fields no longer have dot but an underscore e.g user_firstname.

elasticsearch apache-spark apache-spark-sql

I'm afraid you can't perform a group by over 1.4 TB with only 120 GB of total RAM and achieve good performance.DF will try to load all data in memory/disk and only then it will perform group by. I don't think that at the moment spark/ES connector translates sql syntax in ES query language.

CodeHunter

Elasticsearch + Apache Spark performance

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last