scottviteri comments on Optimality is the tiger, and agents are its teeth

scottviteri 15 Oct 2023 13:17 UTC
1 point
I wonder if this entails that RLHF, while currently useful for capabilities, will eventually become an alignment tax. Namely OpenAI might have text evaluators discourage the LM from writing self-calling agenty looking code.

So in thinking about alignment futures that are the limit of RLHF, these feel like two fairly different forks of that future.