What the proofs actually mean in practice is obviously up for debate, but I think that a pretty reasonable interpretation is that they’re something like analogies which help us get a handle on how powerful the different proposals are in theory.
I’m curious if you agree with the inference of conclusions 1 and 2 from premises 1, 2, and 3, and/or the underlying claim that it’s bad news to learn that your alignment scheme would be able to solve a very large complexity class.
I agree with the gist that it implies that arguments about the equilibrium policy don’t necessarily translate to real models, though I disagree that that’s necessarily bad news for the alignment scheme—it just means you need to find some guarantees that work even when you’re not at equilibrium.
I’m curious if you agree with the inference of conclusions 1 and 2 from premises 1, 2, and 3, and/or the underlying claim that it’s bad news to learn that your alignment scheme would be able to solve a very large complexity class.
I agree with the gist that it implies that arguments about the equilibrium policy don’t necessarily translate to real models, though I disagree that that’s necessarily bad news for the alignment scheme—it just means you need to find some guarantees that work even when you’re not at equilibrium.