Building fat spark jars & bundles for kubernetes deployment
So in the end I got everything working using helm, the spark-on-k8s-operator and sbt-docker
First I extract some of the config into variables in the build.sbt, so they can be used by both the assembly and the docker generator.
// define some dependencies that should not be compiled, but copied into dockerval externalDependencies = Seq( "org.postgresql" % "postgresql" % postgresVersion, "io.prometheus.jmx" % "jmx_prometheus_javaagent" % jmxPrometheusVersion)// Settingsval team = "hazelnut"val importerDescription = "..."val importerMainClass = "..."val targetDockerJarPath = "/opt/spark/jars"val externalPaths = externalDependencies.map(module => { val parts = module.toString().split(""":""") val orgDir = parts(0).replaceAll("""\.""","""/""") val moduleName = parts(1).replaceAll("""\.""","""/""") val version = parts(2) var jarFile = moduleName + "-" + version + ".jar" (orgDir, moduleName, version, jarFile)})
Next I define the assembly settings to create the fat jar (which can be whatever you need):
lazy val assemblySettings = Seq( // Assembly options assembly / assemblyOption := (assemblyOption in assembly).value.copy(includeScala = false), assembly / assemblyMergeStrategy := { case PathList("reference.conf") => MergeStrategy.concat case PathList("META-INF", _@_*) => MergeStrategy.discard case "log4j.properties" => MergeStrategy.concat case _ => MergeStrategy.first }, assembly / logLevel := sbt.util.Level.Error, assembly / test := {}, pomIncludeRepository := { _ => false })
Then the docker settings are defined:
lazy val dockerSettings = Seq( imageNames in docker := Seq( ImageName(s"$team/${name.value}:latest"), ImageName(s"$team/${name.value}:${version.value}"), ), dockerfile in docker := { // The assembly task generates a fat JAR file val artifact: File = assembly.value val artifactTargetPath = s"$targetDockerJarPath/$team-${name.value}.jar" externalPaths.map { case (extOrgDir, extModuleName, extVersion, jarFile) => val url = List("https://repo1.maven.org/maven2", extOrgDir, extModuleName, extVersion, jarFile).mkString("/") val target = s"$targetDockerJarPath/$jarFile" Instructions.Run.exec(List("curl", url, "--output", target, "--silent")) } .foldLeft(new Dockerfile { // https://hub.docker.com/r/lightbend/spark/tags from(s"lightbend/spark:${openShiftVersion}-OpenShift-${sparkVersion}-ubuntu-${scalaBaseVersion}") }) { case (df, run) => df.addInstruction(run) }.add(artifact, artifactTargetPath) })
And I create some Task
to generate some helm Charts / values:
lazy val createImporterHelmChart: Def.Initialize[Task[Seq[File]]] = Def.task { val chartFile = baseDirectory.value / "../helm" / "Chart.yaml" val valuesFile = baseDirectory.value / "../helm" / "values.yaml" val jarDependencies = externalPaths.map { case (_, extModuleName, _, jarFile) => extModuleName -> s""""local://$targetDockerJarPath/$jarFile"""" }.toMap val chartContents = s"""# Generated by build.sbt. Please don't manually update |apiVersion: v1 |name: $team-${name.value} |version: ${version.value} |description: $importerDescription |""".stripMargin val valuesContents = s"""# Generated by build.sbt. Please don't manually update |version: ${version.value} |sparkVersion: $sparkVersion |image: $team/${name.value}:${version.value} |jar: local://$targetDockerJarPath/$team-${name.value}.jar |mainClass: $importerMainClass |jarDependencies: [${jarDependencies.values.mkString(", ")}] |fileDependencies: [] |jmxExporterJar: ${jarDependencies.getOrElse("jmx_prometheus_javaagent", "null").replace("local://","")} |""".stripMargin IO.write(chartFile, chartContents) IO.write(valuesFile, valuesContents) Seq(chartFile, valuesFile)}
Finally it all combines into a project definition in the build.sbt
lazy val importer = (project in file("importer")) .enablePlugins(JavaAppPackaging) .enablePlugins(sbtdocker.DockerPlugin) .enablePlugins(AshScriptPlugin) .dependsOn(util) .settings( commonSettings, testSettings, assemblySettings, dockerSettings, scalafmtSettings, name := "etl-importer", Compile / mainClass := Some(importerMainClass), Compile / resourceGenerators += createImporterHelmChart.taskValue )
Finally together with values files per environment and a helm template:
apiVersion: sparkoperator.k8s.io/v1beta1kind: SparkApplicationmetadata: name: {{ .Chart.Name | trunc 64 }} labels: name: {{ .Chart.Name | trunc 63 | quote }} release: {{ .Release.Name | trunc 63 | quote }} revision: {{ .Release.Revision | quote }} sparkVersion: {{ .Values.sparkVersion | quote }} version: {{ .Chart.Version | quote }}spec: type: Scala mode: cluster image: {{ .Values.image | quote }} imagePullPolicy: {{ .Values.imagePullPolicy }} mainClass: {{ .Values.mainClass | quote }} mainApplicationFile: {{ .Values.jar | quote }} sparkVersion: {{ .Values.sparkVersion | quote }} restartPolicy: type: Never deps: {{- if .Values.jarDependencies }} jars: {{- range .Values.jarDependencies }} - {{ . | quote }} {{- end }} {{- end }}...
I can now build packages using
sbt [project name]/docker
and deploy them using
helm install ./helm -f ./helm/values-minikube.yaml --namespace=[ns] --name [name]
It can probably be made prettier, but for now this works like a charm