How do you manage per-environment data in Docker-based microservices? How do you manage per-environment data in Docker-based microservices? docker docker

How do you manage per-environment data in Docker-based microservices?


Overview

Long post!

  • ENTRYPOINT is your friend
  • Building Microservices by Sam Newman is great
  • Inter-service security tip: 2-way TLS may work, but may present latency issues
  • I will get into a real example from my team. We could not use a configuration server, and things have gotten ... interesting. Manageable for now. But may not scale as the company has more services.
  • Configuration servers seem like a better idea

Update: Almost two years later, we might move to Kubernetes, and start using the etcd-powered ConfigMaps feature that ships with it. I'll mention this again in the configuration servers section. The post could still be worthwhile reading if you are interested in these subjects. We'll still be using ENTRYPOINT and some of the same concepts, just some different tools.

ENTRYPOINT

I suggest that ENTRYPOINT is the key to managing environment-specific configuration for your Docker containers.

In short: create a script to bootstrap your service before starting, and use ENTRYPOINT to execute this script.

I will go into detail contextualizing this, and also explain how how we do this without a configuration server. It gets a bit deep, but it's not unmanageable. Then, I end with details on configuration servers, a better solution for many teams.

Building Microservices

You're right that these are common concerns, but there just aren't one-size-fits-all solutions. The most general solution is a configuration server. (The most general but still not one-size-fits-all.) But perhaps you cannot use one of these: we were barred from using a configuration server by the Security team.

I strongly recommend reading Building Microservices by Sam Newman, if you haven't yet. It examines all the common challenges and discusses many possible solutions, while also giving helpful perspective from a seasoned architect. (Side note: don't worry about a perfect solution to your configuration management; start with a "good enough" solution for your current set of microservices and environments. You can iterate and improve, so you should try to get useful software to your customers ASAP, then improve in subsequent releases.)

Cautionary tale?

Rereading this again ... I cringe a little at how much it takes to explain this fully. From the Zen of Python:

If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.

I'm not thrilled with the solution we have. Yet it's a workable solution, given we couldn't use a configuration server. It's also a real world example.

If you read it and think, "Oh god no, why would I want all that!" then you know, you need to look hard into configuration servers.

Inter-service security

It seems like you are also concerned with how different microservices authenticate each other.

For artifacts and configuration related to this authentication ... treat them like any other configuration artifacts.

What are your requirements around inter-service security? In your post, it sounds like you're describing app-tier, username/password authentication. Maybe that makes sense for the services you have in mind. But you should also consider Two-Way TLS: "this configuration requires the client to provide their certificate to the server, in addition to the server providing their's to the client." Generating and managing these certificates can get complicated ... but however you choose to do it, you'll shuffle around the config/artifacts like any other config/artifacts.

Note that 2-way TLS may introduce latency issues at high volumes. We're not there yet. We are using other measures besides 2-way TLS and we may ditch 2-way TLS once those are proven out, over time.


Real-world example from my team

My current team is doing something that combines two of the approaches you mentioned (paraphrased):

  • Bake configuration at build-time
  • Pull configuration at run-time

My team is using Spring Boot. Spring Boot has really complex Externalized Configuration with a "profiles" system. Spring Boot's configuration handling is complex and powerful, with all the pros/cons that go with that (won't get into that here).

While this is out-of-the-box with Spring Boot, the ideas are general. I prefer Dropwizard for Java microservices, or Flask in Python; in both of those cases, you could do similar thing to what Spring Boot has going on ... You'll just have to do more things yourself. Good and bad: These nimble little frameworks are more flexible than Spring, but when you're writing more code and doing more integrations, there's more responsibility on YOU to QA and test your complex/flexible config support.

I'll continue with the Spring Boot example because of first-hand experience, but not because I'm recommending it! Use what is right for your team.

In the case of Spring Boot, you can activate multiple profiles at a time. That means you can have a base configuration, then override with more specific configuration. We keep a base configuration, application.yml in src/main/resources. So, this config is packaged with the shippable JAR, and when the JAR is executed this config is always picked up. Therefore we include all default settings (common to all environments) in this file. Example: the configuration block that says, "Embedded Tomcat, always use TLS with these cipher suites enabled." (server.ssl.ciphers)

When just one or two variables needs to be overwritten for a certain environment, we leverage Spring Boot's support for getting configuration from environment variables. Example: we set the URL to our Service Discovery using an environment variable. This overrides any default in the shipped/pulled configuration files. Another example: we use an environment variable SPRING_PROFILES_ACTIVE to specify which Configuration Profiles are active.

We also want to make sure master contains a tested, working config for development environments. src/main/resources/application.yml has sane defaults. In addition we put dev-only config in config/application-dev.yml, and check that in. The config directory is picked up easily, but not shipped in the JAR. Nice feature. Developers know (from the README and other documentation) that in a dev environment, all of our Spring Boot microservices require the dev profile to be activated.

For environments besides dev, you can probably already see some options... Any one of these options could do (almost) everything you need. You can mix and match as you need. These options have overlap with some ideas you mention in your original post.

  1. Maintain environment-specific profiles like application-stage.yml, application-prod.yml, and so on,that override settings with deviations from defaults (in a very heavily-locked-down git repository)
  2. Maintain modular, vendor-specific profiles like application-aws.yml, application-mycloudvendor.yml(where you store this will depend on whether it contains secrets). These may contain values that cut acrossstage, prod, etc.
  3. Use environment variables to override any relevant settings, at runtime; including picking profile(s) from 1 and 2
  4. Use automation to bake in hardcoded values (templates) at build or deployment time (output intoa heavily-locked down repository of some sort, possibly distinct from (1)'s repository)

(1), (2), and (3) work well together. We are happily doing all three and it's actually pretty easy to document,reason about, and maintain (after getting the initial hang of it).

You said ...

I suppose you could create a repo of per-environment properties files or script [...] You would need a ton of scripts, though.

It can be manageable. The scripts that pull or bake-in config: these can be uniform across all services. Maybe the script is copied when somebody clones your microservice template (btw: you should have an official microservice template!). Or maybe it's a Python script on an internal PyPI server. More on this after we talk about Docker.

Since Spring Boot has such good support for (3), and support for using defaults/templating in YML files, you may not need need (4). But here's where things get very specific to your organization. The Security Engineer on our team wanted us to use (4) to bake in some specific values for environmentsbeyond dev: passwords. This Engineer didn't want the passwords "floating around" in environment variables, mainly because then -- who would set them? The Docker caller? AWS ECS Task Definition (viewable through AWS web UI)? In those cases, the passwords could be exposed to automation engineers, who wouldn't necessarily have access to the "locked-down git repository" containing application-prod.yml. (4) might not be needed if you do (1); you could just keep the passwords, hardcoded, in the tightly-controlled repository. But maybe there are secrets to generate at deployment-automation time, that you don't want in the same repository as (1). This is our case.

More on (2): we use an aws profile and Spring Boot's "configuration as code" to make a startup-time call to get AWS metadata, and override some config based on that. Our AWS ECS Task Definitions activate the aws profile. The Spring Cloud Netflix documentation gives an example like this:

@Bean@Profile("aws")public EurekaInstanceConfigBean eurekaInstanceConfig() {  EurekaInstanceConfigBean b = new EurekaInstanceConfigBean();  AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");  b.setDataCenterInfo(info);  return b;}

Next, Docker. Environment Variables are a very good way to pass in configuration arguments in Docker. We don't use any command-line or positional arguments because of some gotchas we encountered with ENTRYPOINT. It's easy to pass --env SPRING_PROFILES_ACTIVE=dev or --env SPRING_PROFILES_ACTIVE=aws,prod ... whether from command-line, or from a supervisor/scheduler such as AWS ECS or Apache Mesosphere/Marathon. Our entrypoint.sh also facilitates passing JVM flags that have nothing to do with Spring: we use the common JAVA_OPTS convention for this.

(Oh, I should mention ... we also use Gradle for our builds. At the moment ... We wrap docker build, docker run, and docker push with Gradle tasks. Our Dockerfile is templated, so again, option #4 from above. We have variables like @agentJar@ that get overrwritten at build time. I really don't like this, and I think this could be better handled with plain old configuration (-Dagent.jar.property.whatever). This will probably go way. But I'm just mentioning it for completeness. Something I am happy about with this: nothing is done in the build, Dockerfile, or entrypoint.sh script, that is coupled tightly to a certain deployment context (such as AWS). All of it works in dev environments as well as deployed environments. So we don't have to deploy the Docker image to test it: it's portable, as it should be.)

We have a folder src/main/docker containing the Dockerfile and entrypoint.sh (the script called by ENTRYPOINT; this is baked into the Dockerfile). Our Dockerfile and entrypoint.sh are nearly completely uniform across all microservices. These are duplicated when you clone our microservice template. Unfortunately, sometimes you have to copy/paste updates. We haven't found a good way around this yet, but it's not terribly painful.

The Dockerfile does the following (build-time):

  1. Derives from our "golden" base Dockerfile for Java applications
  2. Grabs our tool for pulling configuration. (Grabs from an internal server available to any dev or Jenkins machine doing a build.) (You could also just use Linux tools like wget as well as DNS/convention-based naming for where to get it. You could also use AWS S3 and convention-based naming.)
  3. Copy some things into the Dockerfile, like the JAR, entrypoint.sh...
  4. ENTRYPOINT exec /app/entrypoint.sh

The entrypoint.sh does the following (run-time):

  1. Uses our tool to pull configuration. (Some logic to understand that if aws profile is not active, the aws config file is not expected.) Dies immediately and loudly if there are any issues.
  2. exec java $JAVA_OPTS -jar /app/app.jar (picks up all the properties files, environment variables, etc.)

So we've covered that at application startup time, configuration is pulled from somewhere ... but where? To points from earlier, they could be in a git repository. You could pull down all profiles then use SPRING_PROFILES_ACTIVE to say which are active; but then you might pull down application-prod.yml onto a stage machine (not good). So instead, you could look at SPRING_PROFILES_ACTIVE (in your configuration-puller logic), and pull only what is needed.

If you are using AWS, you could use S3 repository/ies instead of a git repository. This may allow for better access control. Instead of an application-prod.yml and application-stage.yml living in the same repo/bucket, you could make it so that application-envspecific.yml always has the required configuration, in the S3 bucket by some conventional name in the given AWS account. i.e. "Get the config from s3://ecs_config/$ENV_NAME/application-envspecific.yml" (where $ENV_NAME comes from entrypoint.sh script or ECS Task Definition).

I mentioned that the Dockerfile works portably, and isn't coupled to certain deployment contexts. That is because entrypoint.sh is defined to check for config files in a flexible way; it just wants the config files. So if you use Docker's --volume option to mount a folder with config, the script will be happy, and it won't try to pull anything from an external server.

I won't get into the deployment automation much ... but just mention quickly that we use terraform, boto3, and some custom Python wrapping code. jinja2 for templating (baking in those couple values that need to be baked in).

Here's serious limitation of this approach: the microservice process has to be killed/restarted to re-download and reload config. Now, with a cluster of stateless services, this does not necessarily represent downtime (given some things, like client-side load-balancing, Ribbon configured for retries, and horizontal scale so some instances are always running in the pool). So far it is working out, but the microservices still have pretty low load. Growth is coming. We shall see.

There are many more ways to solve these challenges. Hopefully this exercise has got you thinking about what will work for your team. Just try to get some things going. Prototype rapidly and you'll shake out the details as you go.

Perhaps better: configuration servers

I think this is a more common solution: Configuration Servers. You mentioned ZooKeeper. There's also Consul. Both ZooKeeper and Consul offer both Configuration Management and Service Discovery. There's also etcd.

In our case, the Security team wasn't comfortable with a centralized Configuration Management server. We decided to use NetflixOSS's Eureka for Service Discovery, but hold off on a Configuration Server. If we wind up disliking the methods above, we may switch to Archaius for Configuration Management. Spring Cloud Netflix aims to make these integrations easy for Spring Boot users. Though I think it wants you to use Spring Cloud Config (Server/Client) instead of Archaius. Haven't tried it yet.

Configuration servers seem much easier to explain and think about. If you can, you should start off with a configuration server.

If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.

Comparisons of configuration servers

If you decide to try a config server, you'll need to do a research spike. Here are some good resources to start you off:

If you try Consul, you should watch this talk, "Operating Consul as an Early Adopter". Even if you try something else besides Consul, the talk has nuggets of advice and insight for you.

16/05/11 EDIT: The ThoughtWorks Technology Radar has now moved Consul into the "Adopt" category (history of their evaluation is here).

17/06/01 EDIT: We are considering moving to Kubernetes for multiple reasons. If we do we will leverage the etcd-powered ConfigMaps feature that ships with K8S. That's all for now on this subject :-)

More resources