Today I Learned

Elasticsearch nad Catastrophic Backtracking threats

The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size).”

The above assumes that there’s some kind of user with a malicious intent, who submits an evil regexp to the backend, stacking the web server. But that assumption impose another one which precedes it: there may be a sloppy programmer, who let that happen.

In one of my previous TILs I’vbe been sharing an example approach for building up regexps for ES querying and filtering, basing it on the user input. We had a validation in place which ensured that the regexp we end up with, would be a valid one. But what about catastrophic backtracking? Can you do something about it?

I did some research and it turns out that Elasticsearch itself does not have any native mechanism implemented which would put us on the safe side of the story. If anyone is curious and wants to know why, here is a bunch of links to check: https://github.com/elastic/elasticsearch/issues/17934 -> https://issues.apache.org/jira/browse/LUCENE-7256 -> http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039269.html

So Elasticsearch gives us no safety net. What about Ruby? Can it bring something up to the table? I’ll use a quote here:

Ruby uses the “Non-recursive Backtracking Implementation” discussed in the second article, which means that it does exhibit the same exponentially slow performance as Perl does for pathological regex expressions. In other words, this means that Ruby is NOT using the most optimal regex algorithm available, Thompson NFA, that Russ described in the first article.”

Hence you should not expect that you would be able to validate regexp (using Ruby) before passing it into the Elasticsearch engine.

But there is a thing you can actually do, when looking at my previous example.We defined special chars as follows:

ESCAPE_ES_CHARS = %w(# @ & < > ~ \\\\ \.).freeze

and striped it in a following way:

str.gsub(Regexp.new(ESCAPE_ES_CHARS.join('|')))

What we actually do is we can use the build in Ruby method: https://ruby-doc.org/core-3.0.0/Regexp.html#method-c-escape

Also, as Ruby engineers we can always roll up our sleeves an come up with some star height rating analyzer, basing it on https://github.com/substack/safe-regex or other implementations which can be found on the Internet.