While I definitely agree we over anthropomorphize LLMs, I actually think that LLMs are actually much better from an alignment standpoint than say, RL. The major benefits for LLMs are that they aren’t agents out of the box, and perhaps most importantly, primarily use natural language, which is actually a pretty effective way to get an LLM to do stuff.
Yeah, they work well enough at this (~human) level. But no current alignment techniques are scalable to superhuman AI. I’m worried that basically all of the doom flows through an asymptote of imperfect alignment. I can’t see how this doesn’t happen, short of some “miracle”.
While I definitely agree we over anthropomorphize LLMs, I actually think that LLMs are actually much better from an alignment standpoint than say, RL. The major benefits for LLMs are that they aren’t agents out of the box, and perhaps most importantly, primarily use natural language, which is actually a pretty effective way to get an LLM to do stuff.
Yeah, they work well enough at this (~human) level. But no current alignment techniques are scalable to superhuman AI. I’m worried that basically all of the doom flows through an asymptote of imperfect alignment. I can’t see how this doesn’t happen, short of some “miracle”.