Oops! Well, I did not carefully read the whole post to the end and that’s what you get! Ok second try after reading the post carefully:
it seems like a simple set of desiderata ought to capture the true name of what it means for an AI to lead to good worlds.
I think I have been thinking something similar, and my best description of this desideratum is pragmatism. Something like “use a prior that works” in the worlds where we haven’t already lost. It’s easy to make toy models where alignment will be impossible. → regret bounds for some prior where I don’t know what it looks like yet.
Oops! Well, I did not carefully read the whole post to the end and that’s what you get! Ok second try after reading the post carefully:
I think I have been thinking something similar, and my best description of this desideratum is pragmatism. Something like “use a prior that works” in the worlds where we haven’t already lost. It’s easy to make toy models where alignment will be impossible. → regret bounds for some prior where I don’t know what it looks like yet.