porby comments on Implied “utilities” of simulators are broad, dense, and shallow

porby 29 Jun 2023 19:49 UTC
2 points
0
Also, this actually doesn’t depend on the specific training procedure of auto-regressive LLMs, namely, backpropagation with token-by-token cross-entropy loss.
Agreed.
I do have some concerns about how far you can push the wider class of “predictors” in some directions before the process starts selecting for generalizing instrumental behaviors, but there isn’t a fundamental uniqueness about autoregressive NLL-backpropped prediction.
It seems to me that “high instrumentality” is an instance of what I call an agent having an alien world model.
Possibly? I can’t tell if I endorse all the possible interpretations. When I say high instrumentality, I tend to focus on 1. the model is strongly incentivized to learn internally-motivated instrumental behavior (e.g. because the reward it was trained on is extremely sparse, and so the model must have learned some internal structure encouraging intermediate instrumental behavior useful during training), and 2. those internal motivations are less constrained and may occupy a wider space of weird options.
#2 may overlap with the kind of alienness you mean, but I’m not sure I would focus on the alienness of the world model instead of learned values (in the context of how I think about “high instrumentality” models, anyway).