GitHub Engineering

octocatalog-diff: GitHub's Puppet development and testing tool

Today we are announcing the open source release of octocatalog-diff: GitHub’s Puppet development and testing tool.

Introducing the GitHub Load Balancer

At GitHub we serve billions of HTTP, Git and SSH connections each day. To get the best performance we run on bare metal hardware. Historically one of the more complex components has been our load balancing tier. Traditionally we scaled this vertically, running a small set of very large machines running haproxy, and using a very specific hardware configuration allowing dedicated 10G link failover. Eventually we needed a solution that was scalable and we set out to create a load balancer solution that would run on commodity hardware in our typical data center configuration.

The GitHub GraphQL API

GitHub announced a public API one month after the site launched. We’ve evolved this platform through three versions, adhering to RFC standards and embracing new design patterns to provide a clear and consistent interface. We’ve often heard that our REST API was an inspiration for other companies; countless tutorials refer to our endpoints. Today, we’re excited to announce our biggest change to the API since we snubbed XML in favor of JSON: we’re making the GitHub API available through GraphQL.

Building resilience in Spokes

Spokes is the replication system for the file servers where we store over 38 million Git repositories and over 36 million gists. It keeps at least three copies of every repository and every gist so that we can provide durable, highly available access to content even when servers and networks fail. Spokes uses a combination of Git and rsync to replicate, repair, and rebalance repositories.

Context aware MySQL pools via HAProxy

At GitHub we use MySQL as our main datastore. While repository data lies in git, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth against MySQL via a custom git proxy (babeld). To be able to serve under the high load GitHub operates at, we use MySQL replication to scale out read load.

Older posts Newer posts