mesaoptimizer comments on quetzal_rainbow’s Shortform

mesaoptimizer 21 Jun 2024 11:56 UTC
4 points
0
Evan Hubinger’s Conditioning Predictive Models sequence describes this scenario in detail.
- Carl Feynman 21 Jun 2024 20:38 UTC
  2 points
  0
  Parent
  In a great deal of detail, apparently, since it has a recommended reading time of 131 minutes.
  - mesaoptimizer 21 Jun 2024 20:57 UTC
    2 points
    0
    Parent
    Well, at least a subset of the sequence focuses on this. I read the first two essays and was pessimistic of the titular approach enough that I moved on.
    
    Here’s a relevant quote from the first essay in the sequence:
    
    Furthermore, most of our focus will be on ensuring that your model is attempting to predict the right thing. That’s a very important thing almost regardless of your model’s actual capability level. As a simple example, in the same way that you probably shouldn’t trust a human who was doing their best to mimic what a malign superintelligence would do, you probably shouldn’t trust a human-level AI attempting to do that either, even if that AI (like the human) isn’t actually superintelligent.
    
    Also, I don’t recommend reading the entire sequence, if that was an implicit question you were asking. It was more of a “Hey, if you are interested in this scenario fleshed out in significantly greater rigor, you’d like to take a look at this sequence!”