Today I Learned

Working with ElasticSearch and Rails; part 4

Imagine you find yourself dealing with a following case: you want to allow users to search for employees with a given name. Easy. You quickly assemble a query similar to the one below:

GET company_employee/employee/_search
{
   "query": {
      "bool": {
         "must": [
            {
                "match_phrase": {
                   "name": "Charles"
                }
            }
         ]
      }
   }
}

Where “Charles” is the user input. However you quickly realize (or your client helps you to realize ;) ) that you need to actually retrieve all “Charles”, even if the user types ChaRleS or charles or CHarles into the form.

Assuming that changing the index config is not an option, what you can do is to change the query and try go with the regexp approach. The caveat here is of course that the regexp doesn’t allow for searching case insensitive, but you could always make it to do so “manually”. Here how:

ESCAPE_ES_CHARS = %w(# @ & < > ~ \\\\ \.).freeze

def filter
  {
    bool: {
      must:
        query_strings.map do |query_string|
          {
            bool: {
              should: fields.map do |field|
                {
                  regexp: {
                    field => { value: ".*(#{query_string}).*" }
                  }
                }
              end
            }
          }
        end
    }
  }
end

def query_strings
  @query_strings ||= q.split.map do |keyword|
    qs = keyword.split("").map(&:downcase).map { |char| "[#{[char.upcase, char].uniq.join}]" }.join
    escape_regexp_string(qs)
  end
end

def escape_regexp_string(str)
  str.gsub(Regexp.new(ESCAPE_ES_CHARS.join('|'))) do |match|
    '\\' + match
  end
end

Where the q is actually the user input and fields denotes the collection of fields we would like to match against. My example code is fro mslightly more advanced case however the idea is exactly the same. To programmatically create regexps from user inputs, where we end up with a Charlie guy, mangled like the one below:

GET company_employee/employee/_search
{
   "query": {
      "regexp": { "name": "[Cc][Hh][Aa][Rr][Ll][Ee][Ss]" }
    }
}