Updating Files With the Githup Api - Python

Posted on Dec 13, 2018

One of the features we have at Globality is to update build/setting files on all of our projects using the Github API.

We use that when we generate a new build configuration or a new test configuration. Our build configuration consists of multiple file and directories. Usually in the convention of build.{{ project_name }} and docker/shell files under it + some files in the root of the project like entrypoint.sh and others.

When we generate a new build configuration, we want that to be done in a single commit and to have an obvious commit message. This way, we can track down changes and we don’t have to go through a list of commits.

The easy way to do it would be to use the update_file API, but again, that generates a single commit/diff per file, and it’s not what we want.

The anatomy of a git commit

Before we dive into the code, we need to understand what happens when you do git commit.

With git, you have a few types of “objects”.

  1. blob
  2. tree
  3. tag
  4. commit

Every time you generate a commit, you essentially generate a tree of blob objects that’s “connected” to another tree.

We need to replicate the same thing with Github sugar and make sure we do this the right way.

You can read more about this in depth in this blog post 1

Creating and connecting a tree

What we need to do “manually” is to:

  1. Create input elements - These are blob objects (files) with the right mode (executable for example). Each element in this instance is also a tree element.
  2. Get the head ref with the latest commit and get the tree
  3. Generate a new tree and connect to the base tree

Diving into the code

For clarity, imaging we have a GitFile object that has file_contents() and filename.

We pass in a list of these objects to be patched.

input_elements = [
    InputGitTreeElement(
        path=_file.filename,
        mode='100755' if self.is_executable(_file.filename) else '100644',
        type='blob',
        content=_file.file_contents(),
    )
    for _file in self.files
]

head_ref = self.get_repo().get_git_ref(f"heads/{self.get_branch()}")
latest_commit = self.get_repo().get_git_commit(head_ref.object.sha)
base_tree = latest_commit.tree
new_tree = self.get_repo().create_git_tree(input_elements, base_tree)

new_commit = self.get_repo().create_git_commit(
    message=f"Generating build files through globality-build {version}",
    parents=[latest_commit],
    tree=new_tree,
)

head_ref.edit(sha=new_commit.sha, force=False)

Here, we generate a commit to the default branch, but with slight modifications, we can create new branches and a PR (we do that too in other contexts).

This generates a clean commit with the file changes we want.

Example commit

Conclusion

We automate almost every part of our en masse interactions with Github. From repository creation, release branches creation, tag creation and more.

We have very well defined defaults and conventions that help with making sure repositories look the same and branches are directed the right way. Following these conventions allows the automation to be expected and idempotent (among others).

When we release a new build version, we can EASILY just generate all of the build files across ***all of our repositories, when we generate new CI configuration - same thing. Upgrading node - same thing.

When an engineer creates a new service, node module, python library or any other type of project, they can just generate all the build/ci configuration with one click of a button. This reduces the bottleneck of depending on an ops-person and allows us to move faster in multiple time-zones.


  1. https://blog.thoughtram.io/git/2014/11/18/the-anatomy-of-a-git-commit.html ↩︎