Steven Byrnes comments on Response to Blake Richards: AGI, generality, alignment, & loss functions

Steven Byrnes 15 Jul 2022 18:03 UTC
LW: 2 AF: 2
0
AF
Thanks for your comment! I think it’s slightly missing the point though. Let me explain.
One silly argument would be: “GPT-3 is pretty ‘general’, so we should we should call it ‘AGI’. And GPT-3 is not dangerous. Ergo ‘AGI’ is not dangerous”.
This is a silly argument because it’s just semantics. Agent-y-John-von-Neumann-AGI is possible, and it’s dangerous (i.e. prone to catastrophic out-of-control-misaligned-AGI accidents), and by default sooner or later somebody is going to build it (because it’s scientifically exciting, and there are many actors all over the world who can do so, etc.). That’s a real problem. Whether or not GPT-3 qualifies as “general” has nothing to do with that problem!
In right-column-vs-left-column terms, I claim there are systems (e.g. agent-y-John-von-Neumann-AGI) that are definitely firmly 100% in the right column in every respect, and I claim that such systems are super-dangerous, and that people will nevertheless presumably start messing around with them anyway at some point. Meanwhile, in other news, we can also imagine systems that are both safe and arguably have certain right-column aspects. Maybe language models are an example. OK sure, that’s possible. But those aren’t the systems I want to talk about here.
OK, then a more sophisticated argument would be: “Future language models will be both safe and super-duper-powerful, indeed so powerful that they will change the world, and indeed they’ll change it so much that it’s no use thinking ahead further than that step. Instead, we can basically delegate the problem of ‘what is to be done about people making dangerous agent-y-John-von-Neumann-AGI’ to our AI-empowered descendants [or AI-empowered future selves, depending on your preferred timelines]. Let them figure it out!”
A priori, this could be true, but I happen to think it’s false, for reasons that I won’t get into here. Instead, I think future language models will be moderately useful for future humans—just as computers and zoom and arxiv and github and so on are moderately useful for current humans. (Language models might be useful for AGI safety research even today, for all I know. I personally found GPT-3-assisted-brainstorming to be unhelpful when I tried it, but I didn’t try very hard, and that was a whole year ago, i.e. ancient history by language model standards.]) I don’t think future language models will be so radically transformative as to significantly change our overall situation with respect to the problem of future people building agent-y-John-von-Neumann-AGIs.
(Or if they do get that radically transformative, I think it would be because future programmers, with new insights, found a way to turn language models into something more like an agent-y-John-von-Neumann-AGI—and in particular, something comparably dangerous to agent-y-John-von-Neumann-AGI.)