GitHub Engineering

Careful use of concurrency is particularly important when writing responsive desktop applications. Typically, complex operations are executed on background threads. This results in an app that remains responsive to user input, while still performing complex tasks.

In GitHub Desktop, many background threads will read or write to the same Git repository, at the same time.

However, git is typically not used in a concurrent fashion. When using git via the command line, operations are executed in a sequential manner. Read or write operations are performed against git, independently of each other.

Commands are executed in serial, on the command line

During the build of GitHub Desktop, we discovered executing git commands serially was a one-way ticket to an unresponsive app. For example, waiting to load diffs until after we’ve counted the number of commits in the history would result in a slow and unresponsive application.

To maintain correctness and a responsive user interface, we needed a solution to concurrency control.

Git, libgit2 and concurrency

GitHub Desktop has two methods of interacting with a git repository.

  • Calling into C implementations of the Git core methods via libgit2
  • Shelling out to the git command line interface

We would like to use libgit2 for all of our git operations because it is faster and easier to program with. Unfortunately it is not yet a complete implementation, so we use the CLI to fill in the missing functionality. This poses an interesting problem, in that both git and libgit2 have different approaches to concurrency control.

Git implements a pessimistic approach to concurrency control. Lock files are used to prevent concurrent access to the underlying git objects on disk. When performing an operation against a git object, git will create a *.lock file inside the .git directory. This signals that the * object is locked for use. Further operations are prevented until the lock is released and the *.lock file is deleted.

By contrast libgit2 cannot guarantee objects can safely be shared between threads. Mutable data structures in libgit2 are not thread safe, and operations must be performed carefully. The libgit2 API allows you to compose granular operations together, and granular locking would come at a performance cost. Libgit2 data structures are rarely used in isolation, and concurrency control should be implemented at the level over a collection of fine grained operations or a single unit of work.

A new concurrency model

GitHub Desktop ships as a native application on both Mac and Windows. The Mac app is implemented in Objective-C, while the Windows app is implemented in C#. Both platforms are implemented in a reactive style, using Microsoft’s Reactive Extensions (Rx) and our own ReactiveCocoa (RAC). This allows the composition of background tasks, such as executing git operations. All git operations are executed asynchronously and across thread boundaries.

To ensure GitHub Desktop executed git operations in a safe, and yet performant manner, we needed a new concurrency model that enabled us to:

  • Organize work at the level of asynchronous Observables (Rx) and Signals (RAC) instead of synchronous blocks of code.

  • Perform most operations concurrently.

  • Retain the ability to perform destructive operations serially and exclusively, as required by Git or libgit2

Concurrent and exclusive locks

Each high level operation GitHub Desktop performs can be thought of as a unit of work. A single unit of work can be made up of many fine-grained operations. Our units of work can be categorized as either:

  • Concurrent operations
  • Exclusive operations

Concurrent and exclusive operations don’t always have a 1:1 relationship with reading and writing to the underlying repository. For example, it is safe to write Git refs concurrently with other work, because a ref update is atomic. On the other hand, some read operations may update caches in an unsafe way, and so those need to be performed exclusively.

GitHub Desktop uses an AsyncReaderWriterLock as a queue, upon which concurrent operations can either be run exclusively or in parallel. Exclusive operations behave like a barrier, waiting for previously-enqueued work to complete before beginning, and themselves finishing before any further work starts.

To execute Git operations, the appropriate lock must first be acquired.

public class RepositoryConnection
{
  readonly string localDotGitPath;
  readonly AsyncReaderWriterLock readerWriterLock;

  public RepositoryConnection(string dotGitPath, AsyncReaderWriterLock rwLock)
  {
    localDotGitPath = dotGitPath;
    readerWriterLock = rwLock;
  }

  public IObservable<T> RepositoryConnection<T>(Func<IConcurrentRepositoryConnection, IObservable<T>> operation)
  {
     var connection = Observable.Defer(() =>
     {
         // create a new libgit2 repository object for a given path on disk
         var repo = new Repository(localDotGitPath, new RepositoryOptions();
         return Observable.Return(new ConcurrentRepositoryConnection(repo);
     });

     // defer the given operation, and close the connection on error and complete
     var executeAndClose = connection.SelectMany(conn => Observable.Defer(() => operation(conn))
                                     .Do(x => {}, ex => conn.CloseConnection(), conn.CloseConnection));

     // Add it to the concurrent lock queue.
     return readerWriterLock.AddConcurrentOperation(executeAndClose);
  }

  public IObservable<T> OpenExclusiveConnection<T>(Func<IConcurrentRepositoryConnection, IObservable<T>> operation)
  {
     var connection = Observable.Defer(() =>
     {
         // create a new libgit2 repository object for a given path on disk
         var repo = new Repository(localDotGitPath, new RepositoryOptions();
         return Observable.Return(new ExclusiveRepositoryConnection(repo);
     });

     // defer the given operation, and close the connection on error and complete
     var executeAndClose = connection.SelectMany(conn => Observable.Defer(() => operation(conn))
                                     .Do(x => {}, ex => conn.CloseConnection(), conn.CloseConnection));

     // Add it to the exclusive lock queue.
     return readerWriterLock.AddExclusiveOperation(executeAndClose);
  }
}

Inside GitHub Desktop we define two interfaces. In C# these are IExclusiveRepositoryConnection.cs and IConcurrentRepositoryConnection.cs. While in Objective-C they are defined by GHExclusiveGitConnection.h and GHGitConnection.h. Each of these implementations only allow for git operations which make sense for that lock type.

The ExclusiveRepositoryConnection will only define operations which must be performed with exclusive access to the underlying repository object. The same is true of ConcurrentRepositoryConnections. This means it is impossible to execute exclusive operations concurrently and concurrent operations exclusively. In this way we are able to prevent possible data corruption without a performance trade off.

public class ExclusiveRepositoryConnection
{
  private IRepository repository;

  public ExclusiveRepositoryConnection(IRepository repository)
  {
    this.repository = repository;
  }

  public IObservable<Unit> Commit()
  {
    // Commit to the repository
  }

  public IObservable<Unit> CloseConnection()
  {
    // Execute any required clean up
  }

  //... More exclusive operations
}

For concurrent operations, we define a similar class.

public class ConcurrentRepositoryConnection
{
  private IRepository repository;

  public ConcurrentRepositoryConnection(IRepository repository)
  {
    this.repository = repository;
  }

  public IObservable<Unit> Fetch()
  {
    // Execute a fetch against the repository
  }

  public IObservable<Sha> FindMergeBase(Sha one, Sha two)
  {
    // Calculate the merge base base between two Sha's
  }

  public IObservable<Unit> CloseConnection()
  {
    // Execute any required clean up
  }

  //... More concurrent operations
}

Below is an example of how we might execute a fetch, calculate a merge base and create a commit. Each operation is executed asynchronously using Reactive Extensions, and inside either a concurrent or an exclusive lock. In this example, each operation is queued according to the kind of lock requested. Both Fetch and FindMergeBase will execute concurrently with respect to each other. However, Commit will be queued until all currently executing operations have completed. No subsequent operation will execute until the Commit has completed.


public class LockExample
{
  private RepositoryConnection repositoryConnection;

  public LockExample(RepositoryConnection repositoryConnection)
  {
    this.repositoryConnection = repositoryConnection;
  }

  public void DoWork(Sha first, Sha second)
  {
    repository.OpenConcurrentConnection(connection => connection.Fetch())
              .Subscribe(() =>{} , () => Console.log("Fetch Completed"));

    repository.OpenConcurrentConnection(connection => connection.FindMergeBase(first, second))
              .Subscribe(() =>{} , () => Console.log("Find Merge Base Completed"));

    repository.OpenExclusiveConnection(connection => connection.Commit())
              .Subscribe(() =>{} , () => Console.log("Commit Completed"));
  }
}

Unlike a queue of synchronous work as you might find in Apple’s Grand Central Dispatch or Clojure’s core.async, we treat our asynchronous and thread-hopping operations as atomic units of work. This means that even if we relinquish all threads while waiting for some data, our queue doesn’t actually move onto the next thing until the operation says it’s well and truly completed.

The impact

Before these changes, GitHub Desktop suffered from race conditions as units of work would become interleaved in error.

Since implementing the concurrent/exclusive locks we have seen an improvement in stability and performance. We now have a way to talk about concurrency control at a higher level. At the level of a single unit of work.

By carefully managing git concurrency, GitHub Desktop protects your repositories from possible corruption. The end result is an app that remains responsive, while putting the integrity of your repository first.

Runnable Documentation: Code for Humans LIKE injection