Garrett Baker comments on The Plan − 2022 Update

Garrett Baker 6 Dec 2022 18:09 UTC
LW: 3 AF: 3
0
AF
John usually does not make his plans with an eye toward making things easier. His plan previously involved values because he thought they were strictly harder than corrigibility. If you solve values, you solve corrigibility. Similarly, if you solve abstraction, you solve interpretability, shard theory, value alignment, corrigibility, etc.

I don’t know all the details of John’s model here, but it may go something like this: If you solve corrigibility, and then find out corrigibility isn’t sufficient for alignment, you may expect your corrigible agent to help you build your value aligned agent.
- TurnTrout 17 Dec 2022 21:29 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Similarly, if you solve abstraction, you solve interpretability, shard theory, value alignment, corrigibility, etc.
  In what way do you think solving abstraction would solve shard theory?