I posted a somewhat similar response to MSRayne, with the exception that what you accidentally summon is not an agent with a utility function, but something that tries to appear like one and nevertheless tricks you into making some big mistake.
Here, what you get is a genuine agent which works across prompts by having some internal value function which outputs a different value after each prompt, and acts accordingly, if I understand correctly. It doesn’t seem incredibly unlikely, as there is nothing in the process of evolution that necessarily has to make humans themselves be optimizers, but it happened anyways because that is what best performed in the overall goal of reproduction. This AI will still probably have to somehow convince the people communicating with it to give it “true” agency independent of the user’s inputs. Seems like an instrumental value in this case.
I posted a somewhat similar response to MSRayne, with the exception that what you accidentally summon is not an agent with a utility function, but something that tries to appear like one and nevertheless tricks you into making some big mistake.
Here, what you get is a genuine agent which works across prompts by having some internal value function which outputs a different value after each prompt, and acts accordingly, if I understand correctly. It doesn’t seem incredibly unlikely, as there is nothing in the process of evolution that necessarily has to make humans themselves be optimizers, but it happened anyways because that is what best performed in the overall goal of reproduction. This AI will still probably have to somehow convince the people communicating with it to give it “true” agency independent of the user’s inputs. Seems like an instrumental value in this case.