GitHub Engineering

At GitHub, we use a variant of the Flow pattern to deploy changes: new code is always deployed from a pull request branch, and merged only once it has been confirmed in production. master is our stable release branch, so everything on master is considered production-ready code. If a branch deploy ships bad code containing a bug or performance regression, it is rolled back by deploying the latest master to production.

Using this workflow, engineers at GitHub deploy changes to our website several hundred times every week.

tmm1 tmm1
/graph me 20150517..20150523 @github.deploys.total
hubot Hubot
number of deployments over one week

All deployments happen in chat via Hubot commands, which ensures that everyone in the company (from development to operations to support) has visibility into changes that are being pushed into production.

tmm1 tmm1
/deploy github/my-feature to production
tmm1 is deploying github/my-feature (deadbeef) to production. Check out the haystack firehose so you're the first to know of any new exceptions.
tmm1's production deployment of github/my-feature (deadbeef) is done! (82s)
hubot Hubot
tmm1, make sure you watch for exceptions in haystack and perf issues at graphme

Deployments are reported to the GitHub API and show up in the timeline on corresponding pull requests.

deployments in pull request timeline

Recent deployments are also available through chat.

tmm1 tmm1
/deployed github
hubot Hubot
2015-05-27 15:10:16 -0700 - rtomayko deployed github/regen-missing-merge-commits(a987454a) to production (compare)
2015-05-27 13:37:05 -0700 - connors deployed github/master(ce83e8bb) to production (compare)
2015-05-27 13:29:32 -0700 - jakeboxer deployed github/org-migration-logged-out-500(98451e5c) to production (compare)
2015-05-27 13:13:04 -0700 - grantr deployed github/add-at-to-code-tokenizer(d4daff29) to production (compare)
2015-05-27 12:04:40 -0700 - jakeboxer deployed github/fix-accept-invitation-timeout(f224305e) to production (compare)
2015-05-27 11:56:34 -0700 - jasonrudolph deployed github/api-fix-schema-doc-generation(b2805955) to production (compare)
2015-05-27 11:35:02 -0700 - spraints deployed github/silence-yajl-errors(3adc5bc7) to production (compare)

Over the years, we’ve built a number of deployment features into Hubot and Heaven (our Capistrano-based deployment API) to help streamline our process. Below are some of our favorites.

Deploy Queues

During peak work hours, multiple developers are often trying to deploy their changes to production. To avoid confusion and give everyone a fair chance, we can ask Hubot to add us to the deployment queue.

tmm1 tmm1
/queue me for github
hubot Hubot
tmm1: Ok, I added you to the queue for github. There are 3 people ahead of you.

We can also check the status of the queue, to deploy directly if it’s empty or to find a less busy time if it’s looking particularly full.

tmm1 tmm1
/queue for github
hubot Hubot
tmm1: The current queue for github: mastahyeti, aroben, bhuga

We can also unqueue ourselves if something comes up and we have to step away from the computer.

tmm1 tmm1
/unqueue me for github
hubot Hubot
tmm1: Ok, tmm1 isn't in the github queue anymore.

Deploy Guards

To ensure bad code cannot make it to production, Hubot won’t let us deploy a branch until continuous integration tests have run. This prevents trigger-finger deploys while CI is still running.

tmm1 tmm1
/deploy github/my-feature to production
hubot Hubot
tmm1: Sorry, I couldn't deploy github/my-feature: github and enterprise are still building.

Similarly, if CI completed but our branch failed some tests, Hubot will prevent us from deploying.

tmm1 tmm1
/deploy github/my-feature to production
hubot Hubot
tmm1: Sorry, I couldn't deploy github/my-feature: github and enterprise failed to build.

Since master is our stable release branch, we want to ensure that any branches being deployed are caught up with the latest code in master. Before proceeding with a deployment, Hubot will detect if our branch is behind and automatically merge in master if required.

tmm1 tmm1
/deploy github/my-feature to production
Auto-merged master into my-feature on deployment - Aman Gupta ( github/github@feedface )
Build #2077391 (feedface) of github/my-feature was successful (97s, queued 0s)
Build #2077392 (feedface) of enterprise/my-feature was successful (106s, queued 0s)
tmm1 is deploying github/my-feature (feedface) to production.
tmm1's production deployment of github/my-feature (feedface) is done! (100s)

To ensure deployments are visible to the rest of the team, Hubot forces us to deploy from specific chat rooms.

tmm1 tmm1
/deploy github/my-feature to production
hubot Hubot
tmm1: Sorry, github must be deployed from the appropriate room.

In rare emergency situations, it is possible to override these guards using /deploy!.

Deploy Locks

As soon as a branch is deployed, Hubot locks the environment so no other branches can be deployed to it. This prevents others from accidentally deploying while a developer is testing their branch.

tmm1 tmm1
/deploy github/my-feature to production
hubot Hubot
tmm1: Sorry, github in production is locked by mastahyeti

Once we’ve merged our branch, Hubot will automatically unlock the environment and let the next person in the queue know they can deploy.

hubot Hubot
mastahyeti: it looks like you merged the "stronger-ssh-keys" branch into master, so I've unlocked github in production.
hubot Hubot
tmm1: you're up to deploy github!

We can also manually unlock deploys to let someone else have a turn, when we decide not to merge our branch just yet.

tmm1 tmm1
/unlock github in prod
hubot Hubot
tmm1: github in production is now unlocked.

Finally, during outages, attacks, and other emergency situations, we can lock deployments manually to prevent changes while we investigate problems.

tmm1 tmm1
/lock github in production investigating deployment api 500s
hubot Hubot
tmm1: github in production is now locked.

Deploy Environments

In addition to the main production environment, we can deploy to staging servers that are only accessible by GitHub staff. This staging environment closely mirrors our production environment, including real-world datasets to ensure high-fidelity testing.

To find out what environments are available for deployments, we can ask Hubot for a list and see which ones are currently unlocked.

tmm1 tmm1
/where can i deploy github
hubot Hubot
Deployment status for github:
--------------------------------------------------------------------------------
production: locked 42 seconds ago by grantr: testing the code-search-docvalues branch
lab: locked 15 minutes ago by tmm1: testing the git-repld-reconcile-full-resync branch
garage: unlocked
spider-skull-island: locked 13 minutes ago by grantr: testing the code-search-docvalues branch
machine-room: locked 20 hours ago by charliesome: testing the upgrade-route-syntax branch
branch-lab: always available

The queue for production has 1 person waiting.

The lab, garage and other staging environments each replicate different aspects of production: frontend web workers, background job queues, CDN setup for assets, Git fileserver workers, etc. Depending on what part of the stack a branch touches, we can pick a matching staging environment to exercise the new code without affecting production user traffic.

One of these environments is a special “branch lab” which does not require locking, because it sets up an isolated sandbox for each branch. This helps avoid deploy lock contention and lets developers and designers deploy experimental UI changes as shareable URLs they can send to others in the company for feedback.

tmm1 tmm1
/deploy github/my-feature to branch-lab
tmm1 is deploying github/my-feature (feedface) to branch-lab. Review the deployment checklist before deploying to production.
tmm1's branch-lab deployment of github/my-feature (feedface) is done! (22s)
hubot Hubot
tmm1: your branch-lab is available at https://my-feature.branch.github.com

The branch lab is implemented as a single staging server which runs one unicorn worker per branch. The branches deployed there can be listed via chat, and a branch can be deleted once it’s no longer being used. If the free memory on that server starts to run out, we automatically prune the oldest branches to free up some space.

tmm1 tmm1
/branch-lab list
hubot Hubot
+------------------------------+----------------+----------------+--------------------------------------------------------+
|                            branch-lab (9 branches, disk: 14G used + 91G free, mem: 3G free)                             |
+------------------------------+----------------+----------------+--------------------------------------------------------+
| octicons-playground          | aaronshekey    |    2 hours ago | https://octicons-playground.branch.github.com          |
| move-audit-log               | amosie         |    5 hours ago | https://move-audit-log.branch.github.com               |
| nh-audit-log-docs            | nickh          |    5 hours ago | https://nh-audit-log-docs.branch.github.com            |
| show-lfs-even-if-over        | tclem          |    6 hours ago | https://show-lfs-even-if-over.branch.github.com        |
| js-clone-selector            | dgraham        |    7 hours ago | https://js-clone-selector.branch.github.com            |
| clear-user-session-for-oauth | ptoomey3       |    7 hours ago | https://clear-user-session-for-oauth.branch.github.com |
| remove-wizard                | amosie         |   11 hours ago | https://remove-wizard.branch.github.com                |
| mobile-file-finder           | tmm1           |   18 hours ago | https://mobile-file-finder.branch.github.com           |
| ship-header-nav              | mdo            |   22 hours ago | https://ship-header-nav.branch.github.com              |
+------------------------------+----------------+----------------+--------------------------------------------------------+

We can also manually remove branches that we’re done testing, or have shipped to production already:

tmm1 tmm1
/branch-lab remove my-feature
hubot Hubot
Removing my-feature from branch-lab

Deploy Targets

Once a branch has passed automated tests, undergone code-review, and been verified in staging, it comes time to push it into production. Recall that GitHub engineers are not allowed to merge any pull request that has not yet been verified in production. Production traffic patterns and datasets often trigger edge-cases that expose bugs and performance issues which might not have been seen otherwise, and we want to ensure that our master branch always represents our stable production release.

To safely roll out a risky branch, we can ask Hubot to deploy it to a specific subset of servers within an environment. This limits the user impact of the change, and allows us to monitor for new exceptions or performance regressions coming from the servers that are running our branch.

A change to the Rails version for example can be deployed to one or two frontend webservers, and if things look good we can continue to deploy it to more frontends. Similarly, an upgraded version of Git could be deployed to a handful of backend fileservers.

tmm1 tmm1
/deploy github/rails-6-upgrade to production/fe130,fe131
hubot
tmm1 is deploying github/rails-6-upgrade (feedbeef) to production (github-fe130-cp1-prd, github-fe131-cp1-prd).
tmm1's production deployment of github/rails-6-upgrade (feedbeef) is done! (46s)
hubot Hubot
Exceptions have recently elevated on github (12 exceptions) in the last 3 minutes. tmm1 was the last person to deploy at 07:40 pm PDT (-0700). Care to check it out in haystack?

Once we’ve gained confidence in our branch, we can deploy it to all of production and then merge it to unlock deployments for the next developer in the queue.

Deploy Everything

Our deployment chatops and workflows work so well that we use them for everything in the company. If we want to add a DNS record, we make a pull request to github/dns and use /deploy dns. If we want to add a monitoring alert for a new service, we make a pull request to github/nagios and use /deploy nagios. If we want to install a new software package on a specific frontend, we use /deploy puppet/add-package-branch to prod/fe142. We even use similar workflows to ship new versions of our native desktop apps.

If you aren’t already, we highly recommend you try some of the techniques mentioned in this blog post. This workflow brings a ton of great benefits, including:

  • Low training overhead - every time you deploy, you’re training the next person who is watching the chat transcript.
  • Deep integration into chat and the GitHub API means everyone in the company has visibility into deployments.
  • The process is so lightweight we can use it even for a tiny 1 line change, making code review much more likely to happen.
  • Frequent feedback on small ships ensures nobody spends two weeks on a bad approach.
  • Merge conflicts are rare and trivial when they arise, despite our monolithic architecture.
  • The process gives developers ownership, agency, quick wins, and meaningful responsibility. These are what developer happiness is all about.
tmm1

Infrastructure Engineer

Rearchitecting GitHub Pages Brubeck, a statsd-compatible metrics aggregator