Security Notes: RegEx
Can RegEx be vulnerable?
Photo by Markus Spiske on Unsplash
RegEx Denial-of-Service: ReDOS
One of the more suprising, and yet hard-to-spot vulnerabilities I’ve found is related to regular expressions. Either poorly written or poorly implemented.
Memory/CPU can be exhausted with large or specially crafted user input.
This is a denial-of-service vulnerability, not just a performance smell. If hostile input can pin CPU long enough to starve real users, it belongs in your security threat model.
Warning Signs
- Nested quantifiers, repeated groups, or overlapping alternation
- Backtracking-heavy engines with no timeout or input-length limit
- Expression is used with unchecked user input
- Regex validation runs on a hot request path
Mitigation / Resolution
- RegEx is hard.
- For example, here is how the really smart folks at [OWASP recommend handling IP validation][owasp]:
^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$ - That’s longer than an (old school) tweet, for a 4-byte IP Address!!!
- For example, here is how the really smart folks at [OWASP recommend handling IP validation][owasp]:
- Bound input length before regex evaluation.
- Add timeouts, static analysis, or a non-backtracking engine where the platform supports it.
- This affects almost every language and platform .NET/Node/Python/PERL/Java.