Spark UI History server on Kubernetes?

Yes it is possible. Briefly you will need to ensure following:

Make sure all your applications store event logs in a specific location (filesystem, s3, hdfs etc).
Deploy the history server in your cluster with access to above event logs location.

Now spark (by default) only read from the filesystem path so I will elaborate this case in details with spark operator:

Create a PVC with a volume type that supports ReadWriteMany mode. For example NFS volume. The following snippet assumes you have storage class for NFS (nfs-volume) already configured:

apiVersion: v1kind: PersistentVolumeClaimmetadata:  name: spark-pvc  namespace: spark-appsspec:  accessModes:    - ReadWriteMany  volumeMode: Filesystem  resources:    requests:      storage: 5Gi  storageClassName: nfs-volume

Make sure all your spark applications have event logging enabled and at the correct path:

  sparkConf:    "spark.eventLog.enabled": "true"    "spark.eventLog.dir": "file:/mnt"

With event logs volume mounted to each application (you can also use operator mutating web hook to centralize it ) pod. An example manifest with mentioned config is show below:

---apiVersion: "sparkoperator.k8s.io/v1beta2"kind: SparkApplicationmetadata:  name: spark-java-pi  namespace: spark-appsspec:  type: Java  mode: cluster  image: gcr.io/spark-operator/spark:v2.4.4  mainClass: org.apache.spark.examples.SparkPi  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar"  imagePullPolicy: Always  sparkVersion: 2.4.4  sparkConf:    "spark.eventLog.enabled": "true"    "spark.eventLog.dir": "file:/mnt"  restartPolicy:    type: Never  volumes:    - name: spark-data      persistentVolumeClaim:        claimName: spark-pvc  driver:    cores: 1    coreLimit: "1200m"    memory: "512m"    labels:      version: 2.4.4    serviceAccount: spark    volumeMounts:      - name: spark-data        mountPath: /mnt  executor:    cores: 1    instances: 1    memory: "512m"    labels:      version: 2.4.4    volumeMounts:      - name: spark-data        mountPath: /mnt

Install spark history server mounting the shared volume. Then you will have access events in history server UI:

apiVersion: apps/v1beta1kind: Deploymentmetadata:  name: spark-history-server  namespace: spark-appsspec:  replicas: 1  template:    metadata:      name: spark-history-server      labels:        app: spark-history-server    spec:      containers:        - name: spark-history-server          image: gcr.io/spark-operator/spark:v2.4.0          resources:            requests:              memory: "512Mi"              cpu: "100m"          command:            -  /sbin/tini            - -s            - --            - /opt/spark/bin/spark-class            - -Dspark.history.fs.logDirectory=/data/            - org.apache.spark.deploy.history.HistoryServer          ports:            - name: http              protocol: TCP              containerPort: 18080          readinessProbe:            timeoutSeconds: 4            httpGet:              path: /              port: http          livenessProbe:            timeoutSeconds: 4            httpGet:              path: /              port: http          volumeMounts:            - name: data              mountPath: /data      volumes:      - name: data        persistentVolumeClaim:          claimName: spark-pvc          readOnly: true

Feel free to configure Ingress, Service for accessing the UI.

Also you can use Google Cloud Storage, Azrue Blob Storage or AWS S3 as event log location. For this you will need to install some extra jars so I would recommend having a look at lightbend spark history server image and charts.

CodeHunter

Spark UI History server on Kubernetes?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last