GitHub Engineering

Building resilience in Spokes

Spokes is the replication system for the file servers where we store over 38 million Git repositories and over 36 million gists. It keeps at least three copies of every repository and every gist so that we can provide durable, highly available access to content even when servers and networks fail. Spokes uses a combination of Git and rsync to replicate, repair, and rebalance repositories.

Context aware MySQL pools via HAProxy

At GitHub we use MySQL as our main datastore. While repository data lies in git, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth against MySQL via a custom git proxy (babeld). To be able to serve under the high load GitHub operates at, we use MySQL replication to scale out read load.

gh-ost: GitHub's online schema migration tool for MySQL

Today we are announcing the open source release of gh-ost: GitHub’s triggerless online schema migration tool for MySQL.

SYN Flood Mitigation with synsanity

GitHub hosts a wide range of user content, and like all large websites this often causes us to become a target of denial of service attacks. Around a year ago, GitHub was on the receiving end of a large, unusual and very well publicised attack involving both application level and volumetric attacks against our infrastructure.

GitHub's CSP journey

We shipped subresource integrity a few months back to reduce the risk of a compromised CDN serving malicious JavaScript. That is a big win, but does not address related content injection issues that may exist on GitHub.com itself. We have been tackling this side of the problem over the past few years and thought it would be fun, and hopefully useful, to share what we have been up to.

Older posts Newer posts