Ofer comments on Relaxed adversarial training for inner alignment

Ofer 27 Apr 2022 16:32 UTC
3 points
AF
If the model that is used as a Microscope AI does not use any optimization (search), how will it compute the probability that, say, Apple’s engineers will overcome a certain technical challenge?
- Evan R. Murphy 27 Apr 2022 21:28 UTC
  3 points
  AF Parent
  That’s a good question. Perhaps it does make use of optimization but the model still has an overall passive relationship to the world compared to an active mesa-optimizer AI. I’m thinking about the difference between say, GPT-3 and the classic paperclip maximizer or other tiling AI.
  
  This is just my medium-confidence understanding and may be different from what Evan Hubinger meant in that quote.