Spark UI History server on Kubernetes?
Yes it is possible. Briefly you will need to ensure following:
- Make sure all your applications store event logs in a specific location (
filesystem
,s3
,hdfs
etc). - Deploy the history server in your cluster with access to above event logs location.
Now spark (by default) only read from the filesystem
path so I will elaborate this case in details with spark operator:
- Create a
PVC
with a volume type that supports ReadWriteMany mode. For exampleNFS
volume. The following snippet assumes you have storage class forNFS
(nfs-volume
) already configured:
apiVersion: v1kind: PersistentVolumeClaimmetadata: name: spark-pvc namespace: spark-appsspec: accessModes: - ReadWriteMany volumeMode: Filesystem resources: requests: storage: 5Gi storageClassName: nfs-volume
- Make sure all your spark applications have event logging enabled and at the correct path:
sparkConf: "spark.eventLog.enabled": "true" "spark.eventLog.dir": "file:/mnt"
- With event logs volume mounted to each application (you can also use operator mutating web hook to centralize it ) pod. An example manifest with mentioned config is show below:
---apiVersion: "sparkoperator.k8s.io/v1beta2"kind: SparkApplicationmetadata: name: spark-java-pi namespace: spark-appsspec: type: Java mode: cluster image: gcr.io/spark-operator/spark:v2.4.4 mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar" imagePullPolicy: Always sparkVersion: 2.4.4 sparkConf: "spark.eventLog.enabled": "true" "spark.eventLog.dir": "file:/mnt" restartPolicy: type: Never volumes: - name: spark-data persistentVolumeClaim: claimName: spark-pvc driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 2.4.4 serviceAccount: spark volumeMounts: - name: spark-data mountPath: /mnt executor: cores: 1 instances: 1 memory: "512m" labels: version: 2.4.4 volumeMounts: - name: spark-data mountPath: /mnt
- Install spark history server mounting the shared volume. Then you will have access events in history server UI:
apiVersion: apps/v1beta1kind: Deploymentmetadata: name: spark-history-server namespace: spark-appsspec: replicas: 1 template: metadata: name: spark-history-server labels: app: spark-history-server spec: containers: - name: spark-history-server image: gcr.io/spark-operator/spark:v2.4.0 resources: requests: memory: "512Mi" cpu: "100m" command: - /sbin/tini - -s - -- - /opt/spark/bin/spark-class - -Dspark.history.fs.logDirectory=/data/ - org.apache.spark.deploy.history.HistoryServer ports: - name: http protocol: TCP containerPort: 18080 readinessProbe: timeoutSeconds: 4 httpGet: path: / port: http livenessProbe: timeoutSeconds: 4 httpGet: path: / port: http volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: spark-pvc readOnly: true
Feel free to configure Ingress
, Service
for accessing the UI
.
Also you can use Google Cloud Storage, Azrue Blob Storage or AWS S3 as event log location. For this you will need to install some extra jars
so I would recommend having a look at lightbend spark history server image and charts.