I would say that current LLMs, when prompted and RLHF’d appropriately, and especially when also strapped into an AutoGPT-type scaffold/harness, DO want things. I would say that wanting things is a spectrum and that the aforementioned tweaks (appropriate prompting, AutoGPT, etc.) move the system along that spectrum. I would say that future systems will be even further along that spectrum. IDK what Nate meant but on my charitable interpretation he simply meant that they are not very far along the spectrum compared to e.g. humans or prophecied future AGIs.
It’s a response to “LLMs turned out to not be very want-y, when are the people who expcted ‘agents’ going to update?” because it’s basically replying “I didn’t expect LLMs to be agenty/wanty; I do expect agenty/wanty AIs to come along before the end and indeed we are already seeing progress in that direction.”
To the people saying “LLMs don’t want things in the sense that is relevant to the usual arguments...” I recommend rephrasing to be less confusing: Your claim is that LLMs don’t seem to have preferences about the training objective, or that are coherent over time, unless hooked up into a prompt/scaffold that explicitly tries to get them to have such preferences. I agree with this claim, but don’t think it’s contrary to my present or past models.
I would say that current LLMs, when prompted and RLHF’d appropriately, and especially when also strapped into an AutoGPT-type scaffold/harness, DO want things. I would say that wanting things is a spectrum and that the aforementioned tweaks (appropriate prompting, AutoGPT, etc.) move the system along that spectrum. I would say that future systems will be even further along that spectrum. IDK what Nate meant but on my charitable interpretation he simply meant that they are not very far along the spectrum compared to e.g. humans or prophecied future AGIs.
It’s a response to “LLMs turned out to not be very want-y, when are the people who expcted ‘agents’ going to update?” because it’s basically replying “I didn’t expect LLMs to be agenty/wanty; I do expect agenty/wanty AIs to come along before the end and indeed we are already seeing progress in that direction.”
To the people saying “LLMs don’t want things in the sense that is relevant to the usual arguments...” I recommend rephrasing to be less confusing: Your claim is that LLMs don’t seem to have preferences about the training objective, or that are coherent over time, unless hooked up into a prompt/scaffold that explicitly tries to get them to have such preferences. I agree with this claim, but don’t think it’s contrary to my present or past models.