What you are looking for sounds very much like Vanessa Kosoy’s agenda
As it so happens, the author of the post also wrote this overview post on Vanessa Kosoy’s PreDCA protocol.
Oops! Well, I did not carefully read the whole post to the end and that’s what you get! Ok second try after reading the post carefully:
it seems like a simple set of desiderata ought to capture the true name of what it means for an AI to lead to good worlds.
I think I have been thinking something similar, and my best description of this desideratum is pragmatism. Something like “use a prior that works” in the worlds where we haven’t already lost. It’s easy to make toy models where alignment will be impossible. → regret bounds for some prior where I don’t know what it looks like yet.
As it so happens, the author of the post also wrote this overview post on Vanessa Kosoy’s PreDCA protocol.
Oops! Well, I did not carefully read the whole post to the end and that’s what you get! Ok second try after reading the post carefully:
I think I have been thinking something similar, and my best description of this desideratum is pragmatism. Something like “use a prior that works” in the worlds where we haven’t already lost. It’s easy to make toy models where alignment will be impossible. → regret bounds for some prior where I don’t know what it looks like yet.