In what ways do you see meta decision theory as different from the notion of alignment? At the currently level of detail they sound to me like they are describing the same thing, although alignment is perhaps less pre-committed to decision theory as a solution even if that’s what alignment research is focused on now.
Maybe the intended interpretation is that they are functionally equivalent, but since you didn’t specify I wanted to clarify.
The short answer is I’m trying to solve an easier problem (a control problem that assumes no super intelligence in the computer) that is related in the hope that it will give me insights into both. The insight that I present here is that formal decision theoretic decisions are too slow to realistically control things like networks and meta-level decision theories might provide a way forward. I think this insight applies to both problems.
Aside: I think the AI alignment and AI control problem might be equivalent. But I could see arguments for AI alignment being solved by something acausal in nature, i.e. CEV is not controlled by the actions of people but is aligned to what the people want.
In what ways do you see meta decision theory as different from the notion of alignment? At the currently level of detail they sound to me like they are describing the same thing, although alignment is perhaps less pre-committed to decision theory as a solution even if that’s what alignment research is focused on now.
Maybe the intended interpretation is that they are functionally equivalent, but since you didn’t specify I wanted to clarify.
Thanks! I should put links to the previous works in the what is turning out to be a mini-sequence.
The first one is defining the normal computer control problem and then decomposing.
The short answer is I’m trying to solve an easier problem (a control problem that assumes no super intelligence in the computer) that is related in the hope that it will give me insights into both. The insight that I present here is that formal decision theoretic decisions are too slow to realistically control things like networks and meta-level decision theories might provide a way forward. I think this insight applies to both problems.
Aside: I think the AI alignment and AI control problem might be equivalent. But I could see arguments for AI alignment being solved by something acausal in nature, i.e. CEV is not controlled by the actions of people but is aligned to what the people want.