slowing down LLM progress would be dangerous, as other approaches like RL agents would pass them by before appearing dangerous.
This seems misleading to me & might be a false dichotomy. It’s not LLMs or RL agents. I think we’ll (unfortunately) build agents on the basis of LLMs & the capabilities they have. Every additional progress on LLMs gives these agents more capabilities faster with less time for alignment. They will be (and are!) built based on the mere (perceived) incentives of everybody involved & the unilateralist curse. (See esp. Gwern’s Tool AIs want to be Agent AIs.) I can see that such agents have interpretability advantages over RL agents but since RL agents seem far off with less work going into it, I don’t get why we should race regarding LLMs & LLM-based agents.
I’m personally not sure, if “inherently oracles” is accurate for current LLMs (both before & after RLHF), but it seems simply false when considering plugins & AutoGPT (besides other recent stuff).
I was unclear. I meant that basic LLMs are oracles. The rest of what I said was about the agents made from LLMs you refer to. They are most certainly agents and not oracles. But they’re way better for alignment than RL agents. See my linked post for more on that.
This seems misleading to me & might be a false dichotomy. It’s not LLMs or RL agents. I think we’ll (unfortunately) build agents on the basis of LLMs & the capabilities they have. Every additional progress on LLMs gives these agents more capabilities faster with less time for alignment. They will be (and are!) built based on the mere (perceived) incentives of everybody involved & the unilateralist curse. (See esp. Gwern’s Tool AIs want to be Agent AIs.) I can see that such agents have interpretability advantages over RL agents but since RL agents seem far off with less work going into it, I don’t get why we should race regarding LLMs & LLM-based agents.
I’m personally not sure, if “inherently oracles” is accurate for current LLMs (both before & after RLHF), but it seems simply false when considering plugins & AutoGPT (besides other recent stuff).
I was unclear. I meant that basic LLMs are oracles. The rest of what I said was about the agents made from LLMs you refer to. They are most certainly agents and not oracles. But they’re way better for alignment than RL agents. See my linked post for more on that.