What you are looking for sounds very much like Vanessa Kosoy’s agenda (formal guarantees, regret bounds). Best post I know explaining her agenda. If you liked logical induction, definitely look into Infrabayseanism! It’s very dense, so I would reccomend to start with a short introduction, or just look for good stuff under the infrabayseanism tag. The current state of affairs is that we don’t have these guarantees yet, or at least only with unsatisfactory assumptions.
i am somewhat familiar with vanessa’s work, and it contributed to inspiring this post of mine. i’d like to understand infrabayesianism better, and maybe that intro will help with that, thanks!
What you are looking for sounds very much like Vanessa Kosoy’s agenda
As it so happens, the author of the post also wrote this overview post on Vanessa Kosoy’s PreDCA protocol.
Oops! Well, I did not carefully read the whole post to the end and that’s what you get! Ok second try after reading the post carefully:
it seems like a simple set of desiderata ought to capture the true name of what it means for an AI to lead to good worlds.
I think I have been thinking something similar, and my best description of this desideratum is pragmatism. Something like “use a prior that works” in the worlds where we haven’t already lost. It’s easy to make toy models where alignment will be impossible. → regret bounds for some prior where I don’t know what it looks like yet.
What you are looking for sounds very much like Vanessa Kosoy’s agenda (formal guarantees, regret bounds). Best post I know explaining her agenda. If you liked logical induction, definitely look into Infrabayseanism! It’s very dense, so I would reccomend to start with a short introduction, or just look for good stuff under the infrabayseanism tag. The current state of affairs is that we don’t have these guarantees yet, or at least only with unsatisfactory assumptions.
i am somewhat familiar with vanessa’s work, and it contributed to inspiring this post of mine. i’d like to understand infrabayesianism better, and maybe that intro will help with that, thanks!
As it so happens, the author of the post also wrote this overview post on Vanessa Kosoy’s PreDCA protocol.
Oops! Well, I did not carefully read the whole post to the end and that’s what you get! Ok second try after reading the post carefully:
I think I have been thinking something similar, and my best description of this desideratum is pragmatism. Something like “use a prior that works” in the worlds where we haven’t already lost. It’s easy to make toy models where alignment will be impossible. → regret bounds for some prior where I don’t know what it looks like yet.