GitHub Engineering

LIKE injection

Looking through our exception tracker the other day, I ran across a notice from our slow-query logger that caught my eye. I saw a SELECT … WHERE … LIKE query with lots of percent signs in the LIKE clause. It was pretty obvious that this term was user-provided and my first thought was SQL injection.

[3.92 sec] SELECT ... WHERE (profiles.email LIKE '%64%68%6f%6d%65%73@%67%6d%61%69%6c.%63%6f%6d%') LIMIT 10

Looking at the code, it turned out that we were using a user-provided term directly in the LIKE clause without any checks for metacharacters that are interpreted in this context (%, _, \).

def self.search(term, options = {})
  limit = (options[:limit] || 30).to_i
  friends = options[:friends] || []
  with_orgs = options[:with_orgs].nil? ? false : options[:with_orgs]

  if term.to_s.index("@")
    users = User.includes(:profile)
                .where("profiles.email LIKE ?", "#{term}%")
                .limit(limit).to_a
  else
    users = user_query(term, friends: friends, limit: limit)
  end

  ...
end

While this isn’t full-blown SQL injection, it got me thinking about the impact of this kind of injection. This kind of pathological query clearly has some performance impact because we logged a slow query. The question is how much?

I asked our database experts and was told that it depends on where the wildcard is in the query. With a % in the middle of a query, the database can still check the index for the beginning characters of the term. With a % at the start of the query, indices may not get used at all. This bit of insight led me to run several queries with varied % placement against a test database.

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "chris@github.com";
1 row in set (0.00 sec)

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "%ris@github.com";
1 row in set (0.91 sec)

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "chris@github%";
1 row in set (0.00 sec)

mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "%c%h%r%i%s%@%g%i%t%h%u%b%.%c%o%m%";
21 rows in set (0.93 sec)

It seems that unsanitized user-provided LIKE clauses do have a potential performance impact, but how do we address this in a Ruby on Rails application? Searching the web, I couldn’t find any great suggestions. There are no Rails helpers for escaping LIKE metacharacters, so we wrote some.

module LikeQuery
  # Characters that have special meaning inside the `LIKE` clause of a query.
  #
  # `%` is a wildcard representing multiple characters.
  # `_` is a wildcard representing one character.
  # `\` is used to escape other metacharacters.
  LIKE_METACHARACTER_REGEX = /([\\%_])/

  # What to replace `LIKE` metacharacters with. We want to prepend a literal
  # backslash to each metacharacter. Because String#gsub does its own round of
  # interpolation on its second argument, we have to double escape backslashes
  # in this String.
  LIKE_METACHARACTER_ESCAPE = '\\\\\1'

  # Public: Escape characters that have special meaning within the `LIKE` clause
  # of a SQL query.
  #
  # value - The String value to be escaped.
  #
  # Returns a String.
  def like_sanitize(value)
    raise ArgumentError unless value.respond_to?(:gsub)
    value.gsub(LIKE_METACHARACTER_REGEX, LIKE_METACHARACTER_ESCAPE)
  end

  extend self

  module ActiveRecordHelper
    # Query for values with the specified prefix.
    #
    # column - The column to query.
    # prefix - The value prefix to query for.
    #
    # Returns an ActiveRecord::Relation
    def with_prefix(column, prefix)
      where("#{column} LIKE ?", "#{LikeQuery.like_sanitize(prefix)}%")
    end

    # Query for values with the specified suffix.
    #
    # column - The column to query.
    # suffix - The value suffix to query for.
    #
    # Returns an ActiveRecord::Relation
    def with_suffix(column, suffix)
      where("#{column} LIKE ?", "%#{LikeQuery.like_sanitize(suffix)}")
    end

    # Query for values with the specified substring.
    #
    # column    - The column to query.
    # substring - The value substring to query for.
    #
    # Returns an ActiveRecord::Relation
    def with_substring(column, substring)
      where("#{column} LIKE ?", "%#{LikeQuery.like_sanitize(substring)}%")
    end
  end
end
ActiveRecord::Base.extend LikeQuery
ActiveRecord::Base.extend LikeQuery::ActiveRecordHelper

We then went through and audited all of our LIKE queries, fixing eleven such cases. The risk of these queries turned out to be relatively low. A user could subvert the intention of the query, though not in any meaningful way. For us, this was simply a Denial of Service (DoS) vector. It’s nothing revolutionary and it is not a new vulnerability class, but it’s something to keep an eye out for. Three second queries can be a significant performance hit and application-level DoS vulnerabilities need to be mitigated.

Update: A number of people pointed out that in Rails 4.2, ActiveRecord includes a sanitize_sql_like helper for sanitizing LIKE clauses.

Git Concurrency in GitHub Desktop GitHub's Metal Cloud