Sometimes the environment really is adversarial, though.
Regular expressions as implemented in many programming languages work fine for almost all inputs, but with terrible upper bounds. This is why libraries like re2 are needed if you want to let anyone on the Internet use regular expressions in your search engine.
Another example that comes to mind is that quicksort runs in n·log n expected time if randomly sorted beforehand.
The Internet can be pretty scary, but it is still very much short of an adversarial superintelligence. Sure, sometimes people will want to bring your website down, and sometimes it could be an organized effort, but even that is nothing like a genuine worst case.
OK, granted, in the case of general regex we’re talking about upper bounds that are exponential in the worst case, and it’s not very difficult to come up with really bad regex cases, so in that case a single query could cause you major issues.
That doesn’t necessarily mean you should switch to a library that avoids those worst cases, though. You could simply reject the kinds of regexes that could potentially cause those issues, or run your regex matcher with a timeout and have the search fail whenever the timer expires.
Seems like noise to me. Only one person cared enough about what you wrote to vote, and they disliked your post. Upvoted back to zero, since the comments seem pretty reasonable to me.
Thanks; it didn’t occur to me to think of it statistically, although it seems obvious now.
That said, individual downvotes / upvotes still have reasons behind them; it’s just that you can’t generally expect to find out what they are except when they come in significant clusters.
Sometimes the environment really is adversarial, though.
Regular expressions as implemented in many programming languages work fine for almost all inputs, but with terrible upper bounds. This is why libraries like re2 are needed if you want to let anyone on the Internet use regular expressions in your search engine.
Another example that comes to mind is that quicksort runs in n·log n expected time if randomly sorted beforehand.
The Internet can be pretty scary, but it is still very much short of an adversarial superintelligence. Sure, sometimes people will want to bring your website down, and sometimes it could be an organized effort, but even that is nothing like a genuine worst case.
OK, granted, in the case of general regex we’re talking about upper bounds that are exponential in the worst case, and it’s not very difficult to come up with really bad regex cases, so in that case a single query could cause you major issues.
That doesn’t necessarily mean you should switch to a library that avoids those worst cases, though. You could simply reject the kinds of regexes that could potentially cause those issues, or run your regex matcher with a timeout and have the search fail whenever the timer expires.
I wouldn’t mind knowing the reason I was downvoted here.
Seems like noise to me. Only one person cared enough about what you wrote to vote, and they disliked your post. Upvoted back to zero, since the comments seem pretty reasonable to me.
Thanks; it didn’t occur to me to think of it statistically, although it seems obvious now.
That said, individual downvotes / upvotes still have reasons behind them; it’s just that you can’t generally expect to find out what they are except when they come in significant clusters.
If you have an search engine you don’t want to allow people to search your whole corpus anyway. You usually want to search an index.