DanLevy.net

Security Notes: RegEx

Can RegEx be vulnerable?

Hero image for Security Notes: RegEx. Photo by Markus Spiske on Unsplash

Photo by Markus Spiske on Unsplash

RegEx Denial-of-Service: ReDOS

One of the more suprising, and yet hard-to-spot vulnerabilities I’ve found is related to regular expressions. Either poorly written or poorly implemented.

Memory/CPU can be exhausted with large or specially crafted user input.

This is a denial-of-service vulnerability, not just a performance smell. If hostile input can pin CPU long enough to starve real users, it belongs in your security threat model.

Warning Signs

  1. Nested quantifiers, repeated groups, or overlapping alternation
  2. Backtracking-heavy engines with no timeout or input-length limit
  3. Expression is used with unchecked user input
  4. Regex validation runs on a hot request path

Mitigation / Resolution

  1. RegEx is hard.
    1. For example, here is how the really smart folks at [OWASP recommend handling IP validation][owasp]: ^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
    2. That’s longer than an (old school) tweet, for a 4-byte IP Address!!!
  2. Bound input length before regex evaluation.
  3. Add timeouts, static analysis, or a non-backtracking engine where the platform supports it.
  4. This affects almost every language and platform .NET/Node/Python/PERL/Java.

Reference