Evan R. Murphy comments on Relaxed adversarial training for inner alignment

Evan R. Murphy 26 Apr 2022 18:38 UTC
3 points
AF
I believe it would look like Microscope AI.
- Ofer 27 Apr 2022 16:32 UTC
  3 points
  AF Parent
  If the model that is used as a Microscope AI does not use any optimization (search), how will it compute the probability that, say, Apple’s engineers will overcome a certain technical challenge?
  - Evan R. Murphy 27 Apr 2022 21:28 UTC
    3 points
    AF Parent
    That’s a good question. Perhaps it does make use of optimization but the model still has an overall passive relationship to the world compared to an active mesa-optimizer AI. I’m thinking about the difference between say, GPT-3 and the classic paperclip maximizer or other tiling AI.
    
    This is just my medium-confidence understanding and may be different from what Evan Hubinger meant in that quote.