Roman Leventov comments on Internal independent review for language model agent alignment

Roman Leventov 20 Jul 2023 5:34 UTC
1 point
0
I have no intention to argue this point to death. After all, it’s better to do “too much” LMA alignment research than “too little”. But I would definitely suggest reaching to AGI labs’ safety teams, maybe privately, and at least trying to find out where they are than just to assume that they don’t do LMA alignment.

Connor Leahy proposed banning LLM-based agent’s here: https://twitter.com/NPCollapse/status/1678841535243202562. In the context of this proposal (which I agree with), a potentially high-leverage thing to work on now is a detection algorithm for LLM API usage patterns that indicate agent-like usage. Though, this may be difficult, if the users interleave calling OpenAI API with Anthropic API with local usage of LLaMA 2 in their LMA.

However, if Meta, Eleuther AI, Stability, etc. won’t stop developing more and more powerful “open” LLMs, agents are inevitable, anyway.