Building fat spark jars & bundles for kubernetes deployment Building fat spark jars & bundles for kubernetes deployment kubernetes kubernetes

Building fat spark jars & bundles for kubernetes deployment


So in the end I got everything working using helm, the spark-on-k8s-operator and sbt-docker

First I extract some of the config into variables in the build.sbt, so they can be used by both the assembly and the docker generator.

// define some dependencies that should not be compiled, but copied into dockerval externalDependencies = Seq(  "org.postgresql" % "postgresql" % postgresVersion,  "io.prometheus.jmx" % "jmx_prometheus_javaagent" % jmxPrometheusVersion)// Settingsval team = "hazelnut"val importerDescription = "..."val importerMainClass = "..."val targetDockerJarPath = "/opt/spark/jars"val externalPaths = externalDependencies.map(module => {  val parts = module.toString().split(""":""")  val orgDir = parts(0).replaceAll("""\.""","""/""")  val moduleName = parts(1).replaceAll("""\.""","""/""")  val version = parts(2)  var jarFile = moduleName + "-" + version + ".jar"  (orgDir, moduleName, version, jarFile)})

Next I define the assembly settings to create the fat jar (which can be whatever you need):

lazy val assemblySettings = Seq(  // Assembly options  assembly / assemblyOption := (assemblyOption in assembly).value.copy(includeScala = false),  assembly / assemblyMergeStrategy := {    case PathList("reference.conf") => MergeStrategy.concat    case PathList("META-INF", _@_*) => MergeStrategy.discard    case "log4j.properties" => MergeStrategy.concat    case _ => MergeStrategy.first  },  assembly / logLevel := sbt.util.Level.Error,  assembly / test := {},  pomIncludeRepository := { _ => false })

Then the docker settings are defined:

lazy val dockerSettings = Seq(  imageNames in docker := Seq(    ImageName(s"$team/${name.value}:latest"),    ImageName(s"$team/${name.value}:${version.value}"),  ),  dockerfile in docker := {    // The assembly task generates a fat JAR file    val artifact: File = assembly.value    val artifactTargetPath = s"$targetDockerJarPath/$team-${name.value}.jar"    externalPaths.map {      case (extOrgDir, extModuleName, extVersion, jarFile) =>        val url = List("https://repo1.maven.org/maven2", extOrgDir, extModuleName, extVersion, jarFile).mkString("/")        val target = s"$targetDockerJarPath/$jarFile"        Instructions.Run.exec(List("curl", url, "--output", target, "--silent"))    }      .foldLeft(new Dockerfile {        //       https://hub.docker.com/r/lightbend/spark/tags        from(s"lightbend/spark:${openShiftVersion}-OpenShift-${sparkVersion}-ubuntu-${scalaBaseVersion}")      }) {        case (df, run) => df.addInstruction(run)      }.add(artifact, artifactTargetPath)      })

And I create some Task to generate some helm Charts / values:

lazy val createImporterHelmChart: Def.Initialize[Task[Seq[File]]] = Def.task {  val chartFile = baseDirectory.value / "../helm" / "Chart.yaml"  val valuesFile = baseDirectory.value / "../helm" / "values.yaml"  val jarDependencies = externalPaths.map {    case (_, extModuleName, _, jarFile) =>      extModuleName -> s""""local://$targetDockerJarPath/$jarFile""""  }.toMap  val chartContents =    s"""# Generated by build.sbt. Please don't manually update       |apiVersion: v1       |name: $team-${name.value}       |version: ${version.value}       |description: $importerDescription        |""".stripMargin  val valuesContents =    s"""# Generated by build.sbt. Please don't manually update             |version: ${version.value}       |sparkVersion: $sparkVersion       |image: $team/${name.value}:${version.value}       |jar: local://$targetDockerJarPath/$team-${name.value}.jar       |mainClass: $importerMainClass       |jarDependencies: [${jarDependencies.values.mkString(", ")}]       |fileDependencies: []       |jmxExporterJar: ${jarDependencies.getOrElse("jmx_prometheus_javaagent", "null").replace("local://","")}       |""".stripMargin  IO.write(chartFile, chartContents)  IO.write(valuesFile, valuesContents)  Seq(chartFile, valuesFile)}

Finally it all combines into a project definition in the build.sbt

lazy val importer = (project in file("importer"))  .enablePlugins(JavaAppPackaging)  .enablePlugins(sbtdocker.DockerPlugin)  .enablePlugins(AshScriptPlugin)  .dependsOn(util)  .settings(    commonSettings,    testSettings,    assemblySettings,    dockerSettings,    scalafmtSettings,    name := "etl-importer",    Compile / mainClass := Some(importerMainClass),    Compile / resourceGenerators += createImporterHelmChart.taskValue  )

Finally together with values files per environment and a helm template:

apiVersion: sparkoperator.k8s.io/v1beta1kind: SparkApplicationmetadata:  name: {{ .Chart.Name | trunc 64 }}  labels:    name: {{ .Chart.Name | trunc 63 | quote }}    release: {{ .Release.Name | trunc 63 | quote }}    revision: {{ .Release.Revision | quote }}    sparkVersion: {{ .Values.sparkVersion | quote }}    version: {{ .Chart.Version | quote }}spec:  type: Scala  mode: cluster  image: {{ .Values.image | quote }}  imagePullPolicy: {{ .Values.imagePullPolicy }}  mainClass: {{ .Values.mainClass | quote }}  mainApplicationFile: {{ .Values.jar | quote }}  sparkVersion: {{ .Values.sparkVersion | quote }}  restartPolicy:    type: Never  deps:    {{- if .Values.jarDependencies }}    jars:    {{- range .Values.jarDependencies }}      - {{ . | quote }}    {{- end }}    {{- end }}...

I can now build packages using

sbt [project name]/docker

and deploy them using

helm install ./helm -f ./helm/values-minikube.yaml --namespace=[ns] --name [name]

It can probably be made prettier, but for now this works like a charm