paulfchristiano comments on Where I currently disagree with Ryan Greenblatt’s version of the ELK approach

paulfchristiano 30 Sep 2022 3:10 UTC
LW: 10 AF: 8
2
AF
Our very broad hope is to use ELK to select actions that (i) keep humans safe, and give them time and space to evolve according to their current (essentially local) preferences, (ii) are expected to produce outcomes that would be judged favorably by the future humans, primarily by maximizing option value until it becomes clear what those future humans want (see the strategy stealing assumption).
This is discussed very briefly in this appendix of the ELK report and the subsequent appendix. There are two or three big foreseeable difficulties with this approach and likely a bunch of other problems.
I don’t think this should be particularly persuasive, but it hopefully illustrates how ARC is currently thinking about this part of the problem. Overall my current view is that this is fairly unlikely to be the weakest link in the plan, i.e. if it doesn’t work it will be because of a failure at an earlier step, and so it’s not one of the main things I’m thinking about.