Daniel Kokotajlo comments on Brute force searching for alignment

Daniel Kokotajlo 28 Jun 2021 7:40 UTC
LW: 4 AF: 3
AF
OK, now I get what you are saying! Interesting. I am skeptical that this will work for most alignment problems, due to lack of simple conceptual core maybe. In particular, I doubt that corrigibility and non-deceptiveness have simple conceptual cores. I hope I’m wrong.
- adamShimi 29 Jun 2021 11:04 UTC
  LW: 4 AF: 3
  AF Parent
  Well, if you worry that these properties don’t have a simple conceptual core, maybe you can do the trick where you try to formalize a subset of them with a small conceptual core. That’s basically Evan move on Myopia as a more easy to study subset of non-deceptiveness.