I mean, that’s because this is just a sketch, but a simple argument for why myopia is more natural than “obey humans” is that if we don’t care about competitiveness, we already know how to build myopic optimizers, whereas we don’t know how to build an optimizer to “obey humans” at any level of capabilities.
Furthermore, LCDT is a demonstration that we can at least reduce the complexity of specifying myopia to the complexity of specifying agency. I suspect we can get much better upper bounds on the complexity than that, though.
Furthermore, LCDT is a demonstration that we can at least reduce the complexity of specifying myopia to the complexity of specifying agency.
It’s an interesting idea, but are you confident that LCDT actually works? E.g. have you thought more about the issues I talked about here and concluded they’re not serious problems?
I still don’t see how we could get e.g. an HCH simulator without agentic components (or the simulator’s qualifying as an agent). As soon as an LCDT agent expects that it may create agentic components in its simulation, it’s going to reason horribly about them (e.g. assuming that any adjustment it makes to other parts of its simulation can’t possibly impact their existence or behaviour, relative to the prior).
I think LCDT does successfully remove the incentives you’re aiming to remove. I just expect it to be too broken to do anything useful. I can’t currently see how we could get the good parts without the brokenness.
I mean, that’s because this is just a sketch, but a simple argument for why myopia is more natural than “obey humans” is that if we don’t care about competitiveness, we already know how to build myopic optimizers, whereas we don’t know how to build an optimizer to “obey humans” at any level of capabilities.
Furthermore, LCDT is a demonstration that we can at least reduce the complexity of specifying myopia to the complexity of specifying agency. I suspect we can get much better upper bounds on the complexity than that, though.
It’s an interesting idea, but are you confident that LCDT actually works? E.g. have you thought more about the issues I talked about here and concluded they’re not serious problems?
I still don’t see how we could get e.g. an HCH simulator without agentic components (or the simulator’s qualifying as an agent).
As soon as an LCDT agent expects that it may create agentic components in its simulation, it’s going to reason horribly about them (e.g. assuming that any adjustment it makes to other parts of its simulation can’t possibly impact their existence or behaviour, relative to the prior).
I think LCDT does successfully remove the incentives you’re aiming to remove. I just expect it to be too broken to do anything useful. I can’t currently see how we could get the good parts without the brokenness.
What are you referring to here?