The post seems to be written straight in your “internal language of thought”: many statements that you make without explanation (therefore which are obvious to you) are not obvious to me, and the terminology I often don’t understand. I maybe understood only 30% of this post. If this post was intended for the public, I either recommend editing it (if you want I can collaborate with you on a draft to sort out all the places which I didn’t understand), or adding a disclaimer about the prerequisite reading required to understand it (the “Simulators” post is not sufficient, I’ve read it).
It can be thought of as sampling the minimally collapsed distribution of actions compatible with the conditions rather than grabbing at some goal-directed option just because it’s technically permitted.
Looks like this simulator/agent should adhere to the principle of maximum entropy, I think it’s worthwhile spelling this out.
I maybe understood only 30% of this post. If this post was intended for the public, I either recommend editing it (if you want I can collaborate with you on a draft to sort out all the places which I didn’t understand), or adding a disclaimer about the prerequisite reading required to understand it (the “Simulators” post is not sufficient, I’ve read it).
That could be helpful. I’m pretty clearly suffering from some illusion of transparency here; I can’t easily predict the direction of the confusion.
There’s also Simulators, constraints, and goal agnosticism: porbynotes vol. 1, but that’s some much earlier thoughts on the topic, including some of which I don’t entirely endorse anymore, and it is explicitly braindumping that’s not optimized for any particular audience. And it’s really long.
Looks like this simulator/agent should adhere to the principle of maximum entropy, I think it’s worthwhile spelling this out.
I avoided this for now because I can’t point to exactly how maximum entropy is sufficient for what I intend by “minimally collapsed.”
Naively selecting from the maximum entropy distribution (as narrowed by all the conditions the predictor is aware of) still permits the model to collapse reflective predictions in a way that permits internally motivated goal-directed behavior (leaving aside whether it’s probable), because it’s aware of the reflective nature of the prediction.
In other words, to get to what I mean by “minimally collapsed,” there seems to be some additional counterfactual surgery required. For example, the model could output the distribution that it would output if it knew it did not influence the prediction. Something like predictor punting the prediction to a counterfactual version of itself that then predicts the original predictor’s output, assuming the predictor behaves like a strictly CDT agent. I think this has the right shape (edit: okay pretty sure that’s wrong now, more thinky required), but it’s pretty contorted.
Naively selecting from the maximum entropy distribution (as narrowed by all the conditions the predictor is aware of) still permits the model to collapse reflective predictions in a way that permits internally motivated goal-directed behavior (leaving aside whether it’s probable), because it’s aware of the reflective nature of the prediction.
Hm, isn’t if we apply maximum entropy principle universally, aren’t we also obliged to apply it reflectively, i.e., model oneself as a maximum-entropy (active inference) agent? BTW, this is exactly the setup explored by Ramstead et al. (2022).
In other words, to get to what I mean by “minimally collapsed,” there seems to be some additional counterfactual surgery required. For example, the model could output the distribution that it would output if it knew it did not influence the prediction.
Looks more like a suitable inductive bias and/or bias is needed rather than causal surgery.
Hm, isn’t if we apply maximum entropy principle universally, aren’t we also obliged to apply it reflectively, i.e., model oneself as a maximum-entropy (active inference) agent?
If you precisely define what it means to apply it “universally” such that it gets you the desired behavior, sure. And to be clear, I’m not saying that’s a hard/impossible problem or anything like that, it’s just not directly implied by all things which match the description “follows the principle of maximum entropy.”
Looks more like a suitable inductive bias and/or bias is needed rather than causal surgery.
If you were actually trying to implement this, yes, I wouldn’t recommend routing through weird counterfactuals. (I just bring those up as a way of describing the target behavior.)
In fact, because even the version I outlined in the added footnote can still suffer from collapse in the case of convergent acausal strategies across possible predictors, I would indeed strongly recommend pushing for some additional bias that gives you more control over how the distribution looks. I think that’s pretty tractable, too.
The post seems to be written straight in your “internal language of thought”: many statements that you make without explanation (therefore which are obvious to you) are not obvious to me, and the terminology I often don’t understand. I maybe understood only 30% of this post. If this post was intended for the public, I either recommend editing it (if you want I can collaborate with you on a draft to sort out all the places which I didn’t understand), or adding a disclaimer about the prerequisite reading required to understand it (the “Simulators” post is not sufficient, I’ve read it).
Looks like this simulator/agent should adhere to the principle of maximum entropy, I think it’s worthwhile spelling this out.
That could be helpful. I’m pretty clearly suffering from some illusion of transparency here; I can’t easily predict the direction of the confusion.
The most related posts to read for background would be Implied “utilities” of simulators are broad, dense, and shallow (which I see you’ve now read) and Instrumentality makes agents agenty. There’s also a much bigger post I submitted to the AI alignment awards that goes into more depth, but I haven’t gotten around to publishing that quite yet.
There’s also Simulators, constraints, and goal agnosticism: porbynotes vol. 1, but that’s some much earlier thoughts on the topic, including some of which I don’t entirely endorse anymore, and it is explicitly braindumping that’s not optimized for any particular audience. And it’s really long.
I avoided this for now because I can’t point to exactly how maximum entropy is sufficient for what I intend by “minimally collapsed.”
Naively selecting from the maximum entropy distribution (as narrowed by all the conditions the predictor is aware of) still permits the model to collapse reflective predictions in a way that permits internally motivated goal-directed behavior (leaving aside whether it’s probable), because it’s aware of the reflective nature of the prediction.
In other words, to get to what I mean by “minimally collapsed,” there seems to be some additional counterfactual surgery required. For example, the model could output the distribution that it would output if it knew it did not influence the prediction.
Something like predictor punting the prediction to a counterfactual version of itself that then predicts the original predictor’s output,assuming the predictor behaves like a strictly CDT agent. Ithinkthis has the right shape(edit: okay pretty sure that’s wrong now, more thinky required), but it’s pretty contorted.All that said, I will add an extra footnote.
Hm, isn’t if we apply maximum entropy principle universally, aren’t we also obliged to apply it reflectively, i.e., model oneself as a maximum-entropy (active inference) agent? BTW, this is exactly the setup explored by Ramstead et al. (2022).
Looks more like a suitable inductive bias and/or bias is needed rather than causal surgery.
If you precisely define what it means to apply it “universally” such that it gets you the desired behavior, sure. And to be clear, I’m not saying that’s a hard/impossible problem or anything like that, it’s just not directly implied by all things which match the description “follows the principle of maximum entropy.”
If you were actually trying to implement this, yes, I wouldn’t recommend routing through weird counterfactuals. (I just bring those up as a way of describing the target behavior.)
In fact, because even the version I outlined in the added footnote can still suffer from collapse in the case of convergent acausal strategies across possible predictors, I would indeed strongly recommend pushing for some additional bias that gives you more control over how the distribution looks. I think that’s pretty tractable, too.