gwern comments on Shutdown-Seeking AI

gwern 6 Jun 2023 1:30 UTC
4 points
2

With all due respect to Gwern, repeating claims that work has already been done and then refusing to substantiate them is an epistemic train wreck.

I realize it may sometimes seem like I have a photographic memory and have bibliographies tracking everything so I can produce references on demand for anything, but alas, it is not the case. I only track some things in that sort of detail, and I generally prioritize good ideas. Proposals for interruptibility are not those, so I don’t. Sorry.

Also, it is clear that Gwern did not read the linked research about language agents, since it is simply false, and obviously so, to claim that the generative agents in the Stanford study are the same thing as Gato.

I did read the paper, because I enjoy all the vindications of my old writings about prompt programming & roleplaying by the recent crop of survey/simulation papers as academics finally catch up with the obvious DRL interpretations of GPT-3 and what hobbyists were doing years ago.

However, I didn’t need to, because it just uses… GPT-3.5 via the OA API. Which is the same thing as Gato, as I just explained: it is the same causal-decoder dense quadratic-attention feedforward Transformer architecture trained with backprop on the same agent-generated data like books & Internet text scrapes (among others) with the same self-supervised predictive next-token loss which will induce the same capabilities. Everything GPT-3.5 does* Gato could do in principle (with appropriate scaling etc) because they’re the same damn thing. If you can prompt one for various kinds of roleplaying which you then plug into your retrieval & game framework, then you can prompt the other too—because they’re the same thing. (Not that there is any real distinction between retrieval and other memory/attention mechanisms like a very large context window or recurrent state in the first place; I doubt any of these dialogues would’ve blown through the GPT-4 32k window, much less Anthropic’s 1m etc.) Why could me & Shawn Presser finetune a reward-conditioned GPT-2 to play chess back in Jan 2020? Because they’re the same thing, there’s no difference between a ‘RL GPT’ and a ‘LLM GPT’, it’s fundamentally a property of the data and not the arch.

* Not that you were referring to this, but even fancy flourishes like the second phase of RLHF training in GPT-3.5 don’t make GPT-3.5 & Gato all that different. The RLHF and other kinds of small-sample training only tweak the Bayesian priors of the POMDP-solving that these models learn & not creating any genuinely new capabilities/knowledge (which is why you could know in advance that jailbreak prompts would be hard to squash and that all of these smaller models like Llama were being heavily overhyped, BTW).
- [ ]
  [deleted]