RogerDearnaley comments on A Nonconstructive Existence Proof of Aligned Superintelligence

RogerDearnaley 17 Sep 2024 22:29 UTC
10 points
0
The pessimizing over Knightian uncertainty is a graduated way of telling the model to basically “tend to stay inside the training distribution”. Adjusting its strength enough to overcome the Look-Elsewhere Effect means we estimate how many bits of optimization pressure we’re applying and then do the pessimizing harder depending on that number of bits, which, yes, is vastly higher for all possible states of matter occupying an 8 cubic meter volume than for a 20-way search (the former is going to be a rather large multiple of Avagadro’s number of bits, the latter is just over 4 bits). So we have to stay inside what we believe we know a great deal harder in the former case. In other words, the point you’re raising is already addressed, in a quantified way, by the approach I’m outlining. Indeed on some level the main point of my suggestion is that there is a quantified and theoretically motivated way of dealing with exactly this problem. The handwaving above is a just a very brief summary, accompanied by a link to a much more detailed post containing and explaining the details with a good deal less handwaving.

Trying to explain this piecemeal in a comments section isn’t very efficient: I suggest you go read Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect for my best attempt at a detailed exposition of this part of the suggestion. If you still have criticisms or concerns after reading that, then I’d love to discuss them there.
- Roko 19 Sep 2024 9:39 UTC
  2 points
  0
  Parent
  ok that’s a fair point, I’ll take a look but I am still skeptical about being able to do this in practice because in practice the universe is messy.
  
  e.g. if you’re looking for an optimal practical babysitter and you really do start a search over all possible combinations of matter that fit inside a 2x2x2 cube and start futzing with the results of that search I think it will go wrong.
  
  But if you adopt some constructive approach with some empirically grounded heuristics I expect it will work much better. E.g. start with a human. Exclude all males (sorry bros!). Exclude based on certain other demographics which I will not mention on LW. Exclude based on nationality. Do interviews. Do drug tests. Etc.
  
  Your set of states of a 2x2x2 cube of matter will contain all kinds of things that are bad in ways you don’t understand.