I don’t think that using a simplicity prior is enough to rescue VNM / utility as a good capture of what it means to be “agentic”. Given a transformer model, it’s probably possible to find a reasonably concise energy function (probably of a similar[1] OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this energy function wouldn’t tell you much about what the personas simulated by the model “want” or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent’s beliefs / goals.
It seems possible to construct mathematical objects with the type signature of a utility function, that meaningfully compress a system’s behavior, without those objects telling you much about the long term behavior / goals of the system.
I agree it’s important to think about power-accumulating patterns, but don’t see any particular reason to bring VNM into it. I also don’t think that “power accumulation”, “consequentialism”, and “simultaneously emerging competence across many different domains” go as hand in hand as your DMV example implies. E.g., it seems totally plausible to me that an outwardly sclerotic DMV that never goes out of its way to help the public could still have tight internal coordination and close ranks to thwart hostile management, and that an outwardly helpful / flexible DMV that focuses on the spirit of the law might fail to do so.
Similarly, do top politicians seem to have particularly “consequentialist” cognitive styles[2]? If consequentialist thinking and power accumulation actually do go together hand in hand, then we should expect top politicians to be disproportionately very consequentialist. But if I think about specific cognitive motions that I associate with the EY-ish notion of “consequentialism”, I don’t think top politicians are particularly inclined towards such motions. E.g., how many of them “actively work on becoming ever more consequentialist”? Do they seem particularly good at having coherent internal beliefs? Or a wide range of competence in many different (seemingly) unrelated domains?
I expect you can object that a meaningful utility function should be even more compressed than the agent’s entire policy. I also expect you can create approximate energy functions that are more compressed than the full policy, and that such an energy function would still not tell you much about long term behavior.
I talk about “cognitive styles” here so as to avoid the obvious red herring where people say “well, their actions have big consequences which the politician were systematically steering the world towards, so they must be doing some sort of effective consequentialism”, by which light, covid is being “consequentialist” when it does stuff like infecting billions of people and mutating around vaccine protections.
Given a transformer model, it’s probably possible to find a reasonably concise energy function (probably of a similar OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this [highly compressive] energy function wouldn’t tell you much about what the personas simulated by the model “want” or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent’s beliefs / goals. [This has] the type signature of a utility function, that meaningfully compress a system’s behavior, without… telling you much about the long term behavior / goals of the system.
When I think about the powerful AGI taking over the lightcone, I can definitely see it efficiently juggling familiar resources between nodes in space. E.g., it’ll want to build power collectors of some description around the sun and mine the asteroids. I can understand that AGI as a resource inventory whose feelers grow its resource stocks with time. The AGI’s neural network can also be accurately modeled as an energy function being minimized, expressed in terms of neural network stuffs instead of in familiar resources.
I wouldn’t be terribly surprised if something similar was true for human brains, too. I can model people as steadily accruing social-world resources, like prestige, propriety, money, attractiveness, etc. There’s perhaps also some tidy neural theory, expressed in an alien mathematical ontology, that very compactly predicts an arbitrary actual brain’s motor outputs.
I guess I’m used to modeling people as coherent behavioral profiles with respect to social resources because social resources are an abstraction I have. (I don’t know what given social behaviors would imply about neural outputs expressed in wholly neural ontology, if anything.) If I had some other suite of alien mathematical abstractions that gave me precognitive foresight into people’s future motor outputs, and I could practically operate those alien abstractions, I’d probably switch over to entirely modeling people that way instead. Until I have those precog math abstractions, I have to keep modeling people in the ontology of familiar features, i.e. social resources.
It seems totally plausible to me that an outwardly sclerotic DMV that never goes out of its way to help the public could still have tight internal coordination and close ranks to thwart hostile management, and that an outwardly helpful / flexible DMV that focuses on the spirit of the law might fail to do so.
I completely agree, or at least that isn’t a crux for me here. I’m confused about the extent to which I should draw inferences about AGI behavior from my observations of large human organizations. I feel like that’s the wrong thing to analogize to. Like, if you can find ~a human brain via gradient descent, you can find a different better nearby brain more readily than you can find a giant organization of brains satisfying some behavioral criteria. Epistemic status: not very confident. Anyways, the analogy between AGI and organizations seems weak, and I didn’t intend for it to be a more-than-illustrative, load-bearing part of the post’s argument.
Similarly, do top politicians seem to have particularly “consequentialist” cognitive styles? If consequentialist thinking and power accumulation actually do go together hand in hand, then we should expect top politicians to be disproportionately very consequentialist. But if I think about specific cognitive motions that I associate with the EY-ish notion of “consequentialism”, I don’t think top politicians are particularly inclined towards such motions. E.g., how many of them “actively work on becoming ever more consequentialist”? Do they seem particularly good at having coherent internal beliefs? Or a wide range of competence in many different (seemingly) unrelated domains?
I think the model takes a hit here, yeah… though I don’t wholly trust my own judgement of top politicians, for politics-is-the-mindkiller reasons. I’m guessing there’s an elephant in the brain thing here where, like in flirting, you have strong ancestral pressures to self-deceive and/or maintain social ambiguity about your motives. I (maybe) declare, as an ex post facto epicycle, that human tribal politics is weird (like human flirting and a handful of other ancestral-signaling-heavy domains).
Business leaders do strike me as disproportionately interested in outright self-improvement and in explicitly improving the efficiency of their organization and their own work lives. Excepting the above epicycles, I also expect business leaders to have notably-better-than-average internal maps of the local territory and better-than-average competence in many domains. Obviously, there are some significant confounds there, but still.
I don’t think that using a simplicity prior is enough to rescue VNM / utility as a good capture of what it means to be “agentic”. Given a transformer model, it’s probably possible to find a reasonably concise energy function (probably of a similar[1] OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this energy function wouldn’t tell you much about what the personas simulated by the model “want” or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent’s beliefs / goals.
It seems possible to construct mathematical objects with the type signature of a utility function, that meaningfully compress a system’s behavior, without those objects telling you much about the long term behavior / goals of the system.
I agree it’s important to think about power-accumulating patterns, but don’t see any particular reason to bring VNM into it. I also don’t think that “power accumulation”, “consequentialism”, and “simultaneously emerging competence across many different domains” go as hand in hand as your DMV example implies. E.g., it seems totally plausible to me that an outwardly sclerotic DMV that never goes out of its way to help the public could still have tight internal coordination and close ranks to thwart hostile management, and that an outwardly helpful / flexible DMV that focuses on the spirit of the law might fail to do so.
Similarly, do top politicians seem to have particularly “consequentialist” cognitive styles[2]? If consequentialist thinking and power accumulation actually do go together hand in hand, then we should expect top politicians to be disproportionately very consequentialist. But if I think about specific cognitive motions that I associate with the EY-ish notion of “consequentialism”, I don’t think top politicians are particularly inclined towards such motions. E.g., how many of them “actively work on becoming ever more consequentialist”? Do they seem particularly good at having coherent internal beliefs? Or a wide range of competence in many different (seemingly) unrelated domains?
I expect you can object that a meaningful utility function should be even more compressed than the agent’s entire policy. I also expect you can create approximate energy functions that are more compressed than the full policy, and that such an energy function would still not tell you much about long term behavior.
I talk about “cognitive styles” here so as to avoid the obvious red herring where people say “well, their actions have big consequences which the politician were systematically steering the world towards, so they must be doing some sort of effective consequentialism”, by which light, covid is being “consequentialist” when it does stuff like infecting billions of people and mutating around vaccine protections.
When I think about the powerful AGI taking over the lightcone, I can definitely see it efficiently juggling familiar resources between nodes in space. E.g., it’ll want to build power collectors of some description around the sun and mine the asteroids. I can understand that AGI as a resource inventory whose feelers grow its resource stocks with time. The AGI’s neural network can also be accurately modeled as an energy function being minimized, expressed in terms of neural network stuffs instead of in familiar resources.
I wouldn’t be terribly surprised if something similar was true for human brains, too. I can model people as steadily accruing social-world resources, like prestige, propriety, money, attractiveness, etc. There’s perhaps also some tidy neural theory, expressed in an alien mathematical ontology, that very compactly predicts an arbitrary actual brain’s motor outputs.
I guess I’m used to modeling people as coherent behavioral profiles with respect to social resources because social resources are an abstraction I have. (I don’t know what given social behaviors would imply about neural outputs expressed in wholly neural ontology, if anything.) If I had some other suite of alien mathematical abstractions that gave me precognitive foresight into people’s future motor outputs, and I could practically operate those alien abstractions, I’d probably switch over to entirely modeling people that way instead. Until I have those precog math abstractions, I have to keep modeling people in the ontology of familiar features, i.e. social resources.
I completely agree, or at least that isn’t a crux for me here. I’m confused about the extent to which I should draw inferences about AGI behavior from my observations of large human organizations. I feel like that’s the wrong thing to analogize to. Like, if you can find ~a human brain via gradient descent, you can find a different better nearby brain more readily than you can find a giant organization of brains satisfying some behavioral criteria. Epistemic status: not very confident. Anyways, the analogy between AGI and organizations seems weak, and I didn’t intend for it to be a more-than-illustrative, load-bearing part of the post’s argument.
I think the model takes a hit here, yeah… though I don’t wholly trust my own judgement of top politicians, for politics-is-the-mindkiller reasons. I’m guessing there’s an elephant in the brain thing here where, like in flirting, you have strong ancestral pressures to self-deceive and/or maintain social ambiguity about your motives. I (maybe) declare, as an ex post facto epicycle, that human tribal politics is weird (like human flirting and a handful of other ancestral-signaling-heavy domains).
Business leaders do strike me as disproportionately interested in outright self-improvement and in explicitly improving the efficiency of their organization and their own work lives. Excepting the above epicycles, I also expect business leaders to have notably-better-than-average internal maps of the local territory and better-than-average competence in many domains. Obviously, there are some significant confounds there, but still.