GitHub Engineering

GitHub recently open sourced Licensed in the hopes that it is as helpful to the OSS community as it has been to us.

<disclaimer>
1 of 1 consulted lawyers agree, Licensed is not a replacement for the legal advice of a human.
</disclaimer>

Glossary

Before we go any further, let’s review a few terms that will be repeated throughout this article

  1. Dependency: An external software package used in an application
    • i.e. packages that are required or imported like Octokit, ActiveRecord, React
  2. Dependency source: A class that can enumerate dependencies for an application
    • i.e. by invoking a package management tool such as bundler, npm, bower, or cabal.

What is Licensed?

Licensed helps GitHub engineers make efficient use of OSS by surfacing potential problems with a dependency’s license early in our development cycle, ensuring we maintain dependency license documentation throughout our development cycle.

In practice, enumerating dependencies can be difficult. In the easiest scenario a package manager provides a full listing of project dependencies in a parseable file. More difficult scenarios require detailed knowledge of CLI tools, such as using go list for a general purpose Golang solution or ghc-pkg for Haskell package managers.

How Licensed works

Licensed works in any Git repository to find, cache and check license metadata for dependencies. It can detect dependencies from multiple language types and package managers across multiple projects in a single repository. This flexibility allows Licensed to work equally well for a monolith repository as it would for a repository containing a single project.

Licensed uses a configuration file to determine how and where to enumerate dependencies for a repository. Configuration files specify one or more Licensed applications, where an application describes a location to enumerate dependencies and a directory to store metadata. For more information on configuration files and Licensed applications, see the Licensed documentation.

Finding license metadata

Licensed enumerates dependencies for each application’s source path found in the configuration. For each dependency found, Licensed finds the dependency source location in the local environment and extracts their basic metadata (e.g. name, version, homepage and summary).

Licensed uses Licensee to determine each dependency’s license(s) and find it’s license text (e.g. LICENSE) from the local dependency source location.

Caching license metadata

Once Licensed has the dependency’s metadata, it caches the metadata and license information for the project at the cache path(s) specified in the Licensed configuration file.

Storing the dependency data in a source control repository enables checking dependency data as part of the development workflow. Requiring updates to license data whenever dependencies change forces the license data to stay up to date and relevant.

Keeping the cached data in a source control repository also means you automatically get a history of every dependency change in a single location. Tracking down when a specific dependency changed becomes easier when there is a common location and fewer commits to look through.

Many dependencies’ licenses require distributing a copy of the licenses when used in downstream projects. Licensed makes it easy to automate the build and distribution of these licenses, and collectively an open source bill of materials for your project, along with the project source.

Checking license metadata

Lastly, Licensed is used to report any dependencies needing review. When checking dependency licenses, Licensed performs the following verifications:

  • Verify cached license metadata exists for the dependency
  • Verify the cached metadata is for the correct dependency version
  • Verify the cached metadata has license text
  • Verify the cached metadata has uses an allowed license, or the dependency has been reviewed and accepted

Licensed as part of the developer workflow at GitHub

GitHub engineers have a shared responsibility to ensure that their projects stay compliant with our OSS license requirements.

As the first line of defense in ensuring that dependencies meet our OSS license requirements, each repository has a CI job that checks dependency licenses. This process generally has little impact on developers, and only requires additional effort when a change might not meet our requirements.

When a license needs to be updated, it’s easy to do:

  1. A developer opens a pull request that includes changes to the project dependencies
  2. The repository CI job shows dependency license(s) need review, providing feedback on next steps to resolving the errors
  3. The developer caches license data for the updated dependencies, including the metadata files in the pull request
  4. The repository CODEOWNERS file requests a review from subject matter experts
  5. The subject matter expert reviews the changes and provides guidance to resolve any remaining questions.

This process works very well at GitHub. Involving subject matter experts early in the process reduces friction on the developer and prevents the developer from adding dependencies into the product under license terms that don’t meet our requirements.

Extending Licensed for new dependency sources

Whenever a new project is started, we always try to use the best tool for the job. In many cases this means a new language or framework that isn’t supported by Licensed. To handle these cases, we’ve made adding a new dependency source to Licensed as easy as possible.

Creating new dependency sources in Licensed is easy. Here is a simple example:

module MyProject
  class MySource

    # Required.  I need a configuration for basic functionality
    def initialize(config)
      @config = config
    end

    # Required.  Tell the world the name of the dependency source
    def type
      "my source"
    end

    # Required.  Give the world the dependencies found for `@config`
    def dependencies
      # Will this parse a package manager file?
      # Will this use CLI tools to find dependencies?
      # Nope!  I'm a hardcoded list!
      [
        Dependency.new(
          @config.source_path, # location used to find license text (e.g. LICENSE)
          name: "licensed",
          type: type,
          homepage: "https://github.com/github/licensed",
          version: "0.13.0",
          summary: "Extract and validate the licenses of dependencies."
        )
      ]
    end
  end
end

Next steps

Future development for Licensed will focus on

  1. Reducing friction when using Licensed in developer workflows
  2. Reducing friction when adding new dependency sources
  3. Adding new dependency sources :smile:

Licensed isn’t just about open source, it is open source itself. Interested in adapting the tool to your team’s workflow or adding support for your favorite package manager? We’d love your help.

February 28th DDoS Incident Report Performance Impact of Removing OOBGC