I guess I was just thinking, sometimes every option is out-of-distribution, because the future is different than the past, especially when we want AGIs to invent new technologies etc.
I agree that adversarially-chosen OOD hypotheticals are very problematic.
I think Stuart Armstrong thinks the end goal has to be a utility function because utility-maximizers are in reflective equilibrium in a way that other systems aren’t; he talks about that here.
Thanks!
I guess I was just thinking, sometimes every option is out-of-distribution, because the future is different than the past, especially when we want AGIs to invent new technologies etc.
I agree that adversarially-chosen OOD hypotheticals are very problematic.
I think Stuart Armstrong thinks the end goal has to be a utility function because utility-maximizers are in reflective equilibrium in a way that other systems aren’t; he talks about that here.