GitHub Engineering

GLB: GitHub's open source load balancer

At GitHub, we serve tens of thousands of requests every second out of our network edge, operating on GitHub’s metal cloud. We’ve previously introduced GLB, our scalable load balancing solution for bare metal datacenters, which powers the majority of GitHub’s public web and git traffic, as well as fronting some of our most critical internal systems such as highly available MySQL clusters. Today we’re excited to share more details about our load balancer’s design, as well as release the GLB Director as open source.

MySQL High Availability at GitHub

GitHub uses MySQL as its main datastore for all things non-git, and its availability is critical to GitHub’s operation. The site itself, GitHub’s API, authentication and more, all require database access. We run multiple MySQL clusters serving our different services and tasks. Our clusters use classic master-replicas setup, where a single node in a cluster (the master) is able to accept writes. The rest of the cluster nodes (the replicas) asynchronously replay changes from the master and serve our read traffic.

Performance Impact of Removing OOBGC

Until last week, GitHub used an Out of Band Garbage Collector (OOBGC) in production. Since removing it, we decreased CPU time across our production machines by 10%. Let’s talk about what an OOBGC is, when to use it, and when not to use it. Then follow up with some statistics about the impact of removing it from GitHub’s stack.

Improving your OSS dependency workflow with Licensed

GitHub recently open sourced Licensed in the hopes that it is as helpful to the OSS community as it has been to us.

February 28th DDoS Incident Report

On Wednesday, February 28, 2018 GitHub.com was unavailable from 17:21 to 17:26 UTC and intermittently unavailable from 17:26 to 17:30 UTC due to a distributed denial-of-service (DDoS) attack. We understand how much you rely on GitHub and we know the availability of our service is of critical importance to our users. To note, at no point was the confidentiality or integrity of your data at risk. We are sorry for the impact of this incident and would like to describe the event, the efforts we’ve taken to drive availability, and how we aim to improve response and mitigation moving forward.

Older posts Newer posts