Roko comments on A Nonconstructive Existence Proof of Aligned Superintelligence

Roko 17 Sep 2024 9:35 UTC
2 points
0
I’m not particularly sold on the idea of launching a powerful argmax search and then doing a bit of handwaving to fix it.

It’s like if you wanted a childminder to look after your young child, and you set off an argmax search to find the argmax of a function that looks like (quality) / (cost) and then afterwards trying to sort out whether your results are somehow broken/goodhearted.

If your argmax search is over 20 local childminders then that’s probably fine.

But if it’s an argmax search over all possible states of matter occupying an 8 cubic meter volume then… uh yeah that’s really dangerous.
- RogerDearnaley 17 Sep 2024 22:29 UTC
  10 points
  0
  Parent
  The pessimizing over Knightian uncertainty is a graduated way of telling the model to basically “tend to stay inside the training distribution”. Adjusting its strength enough to overcome the Look-Elsewhere Effect means we estimate how many bits of optimization pressure we’re applying and then do the pessimizing harder depending on that number of bits, which, yes, is vastly higher for all possible states of matter occupying an 8 cubic meter volume than for a 20-way search (the former is going to be a rather large multiple of Avagadro’s number of bits, the latter is just over 4 bits). So we have to stay inside what we believe we know a great deal harder in the former case. In other words, the point you’re raising is already addressed, in a quantified way, by the approach I’m outlining. Indeed on some level the main point of my suggestion is that there is a quantified and theoretically motivated way of dealing with exactly this problem. The handwaving above is a just a very brief summary, accompanied by a link to a much more detailed post containing and explaining the details with a good deal less handwaving.
  
  Trying to explain this piecemeal in a comments section isn’t very efficient: I suggest you go read Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect for my best attempt at a detailed exposition of this part of the suggestion. If you still have criticisms or concerns after reading that, then I’d love to discuss them there.
  - Roko 19 Sep 2024 9:39 UTC
    2 points
    0
    Parent
    ok that’s a fair point, I’ll take a look but I am still skeptical about being able to do this in practice because in practice the universe is messy.
    
    e.g. if you’re looking for an optimal practical babysitter and you really do start a search over all possible combinations of matter that fit inside a 2x2x2 cube and start futzing with the results of that search I think it will go wrong.
    
    But if you adopt some constructive approach with some empirically grounded heuristics I expect it will work much better. E.g. start with a human. Exclude all males (sorry bros!). Exclude based on certain other demographics which I will not mention on LW. Exclude based on nationality. Do interviews. Do drug tests. Etc.
    
    Your set of states of a 2x2x2 cube of matter will contain all kinds of things that are bad in ways you don’t understand.