Generating CI Configurations at Scale

Intro

You often hear about scale issues when discussing capacity, traffic, users and other resource related topics and issues. However, scale is often in the details, the processes and how things run at specific companies.

At Globality, we have ~80 microservices running in production. We have FE, BE and Gateway applications. JS, Python and third party apps. Most applications also require internal libraries to run.

All of this needs to be built, tested and deployed.

We use CircleCI for everything, but Travis or any other SAAS based CI will have the same common concepts, it’s a configuration file that exists inside the project source code and contains instructions on how to build, test and deploy the project.

Even before we had 80 services, we had significant pain with this concept, maintaining those files, changing/adding things to the build process was a huge pain.

Imagine you have to edit 80 files in 80 repos every time you want to add a feature to your build process. Say you want to annotate grafana for every release. Can you imagine that?

The Globality Way

Before actually writing about implementation, there’s quite a bit of context about what we like to call “The Globality Way”. It’s the way we do things and decisions we made along the way for how we want to run an engineering culture.

Consistency is key

Lots of companies do microservices, many of those companies use that as a way to have polyglot teams doing things “their way”. For us, we don’t focus teams around services but around the business problems. We might have one team work on an approval flow for large customers while another team builds the next generation of matching and another improves usability of our interactive dialog experience.

This lead us to decide that services should look the same.

Python services use our home-grown microcosm library, Node services use our internal nodule* libraries, we are working on releasing the way we do FE as well.

We have internal (and open source) templates for what services should look like and generally speaking, when you land on a service, you feel right at home. You might not know the model names but you know where the models, routes and controllers are and exactly where to go when you need to edit/change/remove things.

Dependencies and version upgrades happen all the time

Going back to consistency, we open/close the dependency chain with every release. This means that every Python/Node project (client side as well) will get fresh dependencies. We remove the yarn.lock, requirements.txt and basically install everything with the newest version. This happens every release.

Same goes for Python/Node versions, when we upgrade, everything upgrades together.

What we found is that when you do that continuously, the cost is much lower than you imagine. If you try to update Node dependencies 6 months into a project, you are in for a world of pain, when you do that every two weeks you really don’t fix breaks all that often.

Summing Up The Globality Way

In a post like this, the context is often missing, I think it’s important to understand those decisions that led to the way a company does things.

I personally feel that what makes Globality great and what makes the engineering organization run smoothly.

The problem(s)

Configuration

Like I described above, the problem is that the configuration is not flexible enough for our needs. I am not talking about the configuration DSL but the process of configuring projects just stopped scaling.

When engineers needed to add a service, they were asking “Where is the latest configuration?”

We started off with CookieCutter templates for services like here: cookiecutter-microcosm-service. This helped with starting new services but did not with mutating and changing running project configuration.

Build Image / Machine

We have 3 main types of projects (we have more, but let’s imagine)

For Python, just in the last 18 months, we changed the version of Python we use twice. For Node, 3 times, For FE 3 times as well (Node versions).

So, you want the build image to be the same for all projects, python version to be the same for all projects (unless something needs custom like AI projects).

All builds need common tools; for us, this includes awscli, jq and docker to run etc…

CircleCI allows you to install custom dependencies, but this slows down the build significantly, we don’t want that. Also, if we need another dependency to be installed, the same goes for the configuration, we need to add it to every single one (sigh).

Sharing a base build image per project type is crucial and when a decision is made to change/add dependencies, all project must implement that immediately and without hiccups.

The build image also includes internal libraries that are baked in, so you can call the internal CLI without installing it on every single build.

Self Serve

This is a tricky one. We want configuration to be self-serve, we don’t want to have any support required from Devops, SRE, OPS. As an engineer, you should be able to generate your configuration alone, it should “just work” (I hope this term is not copyrighted).

There should be a single source of truth for “what is the newest configuration?” and “how do I generate the config?”

Rapid changes and consistency

When a change is made, we want to make sure we can patch all projects quickly and efficiently, we can’t afford copy-pasta or patching a single project at a time and allow decay with versioning where some project run one version and others run a version 3 months older.

The Solution(s)

I am redacting quite a bit of context but making the DSL and the way we solved it really clear.

The globality-build library

We created a library called globality-build which is published internally, it is the only source of truth for configuration templates and is very self serve.

The library uses Jinja internally for templating, so it’s something that is battle tested and very familiar to almost everyone (on our team).

The library reads a configuration file and generates the CI configuration files based on the content of it.

Configuration Template

The solution we came up with was to have a .globality/build.json configuration file per project that will determine the rules to generate the CI configuration.

Here’s what the file looks like:

    {
      "type": "python-service",
      "params": {
        "name": "REDACTED",
        "databases": [
          "elasticsearch"
        ],
        "versions": {
          "elasticsearch": "6.0.1"
        }
      }
    }

The type of the project determines the base type of the template to use from globality-build. Let’s dive into the library template directory for a sec, here’s what it looks like:

Here’s a redacted tree of the base templates

├── circleci.base.j2
├── circleci.config.frontend.yml.j2
├── circleci.config.node-module.yml.j2
├── circleci.config.node-service.yml.j2
├── circleci.config.python-library.yml.j2
└── circleci.config.python-service.yml.j2

As you can see, we have a template per project type, let’s dive into the configuration part for better understanding.

Here’s another project type build.json for example:

    {
      "type": "python-service",
      "params": {
        "name": "REDACTED",
        "database": "postgres"
      }
    }

Now, all it takes is to run globality build circle-gen --config .globality/build.json.

All of the logic is inside globality-build and it will generate the right configuration every single time.

When we add new project types, we add the support in globality-build and generate the configuration from there.

Here’s a snippet from one of the templates:

    {% block lint %}
      lint:
        <<: *defaults
    
        steps:
          - attach_workspace:
              at: ~/repo
    
          - setup_remote_docker
    
          - run:
              name: Run Lint
              command: |
                docker create -v /src/{{ import_name }}/tests/ --name service_tests alpine:3.4 /bin/true
                docker cp $(pwd)/{{ import_name }}/tests service_tests:/src/{{ import_name }}/
                docker run -it --volumes-from service_tests ${AWS_ECR_DOMAIN}/{{ repo_name }}:${CIRCLE_SHA1} lint
    
    {% endblock %}

Patching Projects When Config Changes

Now that we have a way to generate the configuration for every single project type, we want to patch all projects when the configuration changed.

Configuration changes can include a number of things, from bug fixes in the actual configuration syntax to adding new features like a deployment waiter, grafana callbacks and much more.

Even when you have a library that can generate the configuration, you will still need to go manually clone the projects and commit the changes after the file is patched, this is something we obviously want to avoid.

The solution for this was to patch the file directly using Github API.

Github File Update API

Repository Content API.

The Github API allows you to update a file, if you have the contents of that file in some variable, it’s pretty straight forward to update any file in any repo.

First, we patched the globality-build process to output the configuration to stdout, this way, we can programatically extract that into a variable using subprocess.

We use PyGithub library to manage the interaction, here’s a snippet.

        self.output = check_output(
            [
                "globality",
                "build",
                "circle-gen",
                "--configuration",
                str(self.build_config_content),
                "--output",
                "print",
            ],
        )

        if dry_run:
            print("Skipping github update for circle configuration")
            print(f"Updated file would be: {self.output}")
            return

        self.repository.update_file(
            path=f"/{CIRCLE_CONFIG}",
            message=f"Update Circle configuration - Automated for {plan.version}",
            content=self.output,
            sha=self.circle_sha,
            branch=plan.release_branch_name,
        )

        print(f"Updated the content of {CIRCLE_CONFIG} for {self.repository.name}")

The flow is this:

This is all done through Github API, there is no cloning needed or actual file operations, everything is tracked as regular commits.

Here’s an underwhelming example of an automated update:

Configuration patch

And another one:

Configuration patch

Those are just recent examples to patching the configuration files across all of our repositories in an automated way.

Versioning

Now that you know we have a way to patch this through the Github API, you know we also have a way to control versioning.

We use git flow to control releases, develop is always the most recent code, release/ is a release candidate and master is the code that’s deployed to production.

We can patch the configuration at any point and on any branch, but if we only patch it “above” develop to fix crucial bugs.

In this case you know that the config that was used to build the project in release/ is the same config that was used in master and nothing changed.

Build Images

CircleCI and many other CI services give you the option to use a base image/machine for all your builds.

We have base build images for all project types.

The build images include any runtime dependency that the project needs, any native extensions, linux packages etc…

We publish those images every time there is a change that is needed and we allow specific projects to customize packages. (in the build.json file)

Here’s a very simplistic image for example:

FROM circleci/python:3.6.2
ARG EXTRA_INDEX_URL
ENV EXTRA_INDEX_URL ${EXTRA_INDEX_URL}
RUN pip install awscli httpie boto3
RUN python -m pip install --extra-index-url $EXTRA_INDEX_URL

When we upgrade python/Node version, we just update globality-build rinse and repeat the processes above.

We recently changed Node version, it took precisely 2 minutes to have all builds with fresh Node version.

Again, release candidates remained untouched, same concept of versioning applies here as well.

Conclusion

As the Devops lead at Globality, I am constantly looking for ways to improve engineering workflows and make processes better and more intuitive for engineers.

By doing this, we solved a really painful workflow of updating and maintaining CI configurations.

This gives us great flexibility for change and allowing us to move faster and innovate, without worrying about backwards compatibility and slow-migration of projects.

Further reading

I wrote about another scaling issue we experienced before with keeping dependencies fresh, you can read about it here: Keeping Dependencies Fresh

Submit to Hacker News