Petter comments on The Weighted Majority Algorithm

Petter 21 Sep 2014 22:41 UTC
2 points
Sometimes the environment really is adversarial, though.

Regular expressions as implemented in many programming languages work fine for almost all inputs, but with terrible upper bounds. This is why libraries like re2 are needed if you want to let anyone on the Internet use regular expressions in your search engine.

Another example that comes to mind is that quicksort runs in n·log n expected time if randomly sorted beforehand.
- lackofcheese 22 Sep 2014 4:44 UTC
  3 points
  Parent
  The Internet can be pretty scary, but it is still very much short of an adversarial superintelligence. Sure, sometimes people will want to bring your website down, and sometimes it could be an organized effort, but even that is nothing like a genuine worst case.
  - lackofcheese 22 Sep 2014 9:33 UTC
    1 point
    Parent
    OK, granted, in the case of general regex we’re talking about upper bounds that are exponential in the worst case, and it’s not very difficult to come up with really bad regex cases, so in that case a single query could cause you major issues.
    
    That doesn’t necessarily mean you should switch to a library that avoids those worst cases, though. You could simply reject the kinds of regexes that could potentially cause those issues, or run your regex matcher with a timeout and have the search fail whenever the timer expires.
    - lackofcheese 22 Sep 2014 14:53 UTC
      0 points
      Parent
      I wouldn’t mind knowing the reason I was downvoted here.
      - Shmi 22 Sep 2014 17:00 UTC
        1 point
        Parent
        Seems like noise to me. Only one person cared enough about what you wrote to vote, and they disliked your post. Upvoted back to zero, since the comments seem pretty reasonable to me.
        lackofcheese 22 Sep 2014 22:45 UTC
        0 points
        Parent
        Thanks; it didn’t occur to me to think of it statistically, although it seems obvious now.
        
        That said, individual downvotes / upvotes still have reasons behind them; it’s just that you can’t generally expect to find out what they are except when they come in significant clusters.
- ChristianKl 22 Sep 2014 11:37 UTC
  0 points
  Parent
  If you have an search engine you don’t want to allow people to search your whole corpus anyway. You usually want to search an index.