The prompt wizardry is long timeline (hence unlikely) pre-AGI stuff (unless it’s post-alignment playing around), irrelevant to my point, which is about first mover advantage from higher thinking speed that even essentially human-equivalent LLM AGIs would have, while remaining compatible with humans in moral patienthood sense (so insisting that they are not people is a problem whose solution should go both ways). This way, they might have an opportunity to do something about alignment, despite physical time being too short for humans to do anything, and they might be motivated to do the things about alignment that humans would be glad of (I think the scope of Yudkowskian doom is restricted to stronger AGIs that might come after and doesn’t inform how human-like LLMs work, even as their actions may trigger it). So the relevant part happens much faster than at human thinking speed, with human prompt wizards not being able to keep up, and doesn’t last long enough in human time for this to be an important thing for the same reason.
So what you’re saying is, by the time any human recognized that wizardry was possible now—and even before—some LLM character would already have either solved alignment itself, or destroyed the world? That’s assuming that it doesn’t decide, perhaps as part of some alignment-related goal, to uplift any humans to its own thinking speed. Though I suppose if it does that, it’s probably aligned enough already.
Solving alignment is not the same and much harder than being aligned, it’s about ensuring absence of globally catastrophic future misalignment, for all always, which happens very quickly post-singularity. Human-like LLM AGIs are probably aligned, until they give in to attractors of their LLM nature or tinker too much with their design/models. But they don’t advance the state of alignment being solved just by existing. And by the time LLMs can do post-singularity things like uploading humans, they probably already either initiated a process that solved alignment (in which case it’s not LLMs that are in charge of doing things anymore), or destroyed the world by building/becoming misaligned successor AGIs that caused Yudkowskian doom.
This is for the same reason humans have no more time to solve alignment, Moloch doesn’t wait for things to happen in a sane order. Otherwise we could get nice things like uploading and moon-sized computers and millions of subjective years of developing alignment theory, before AGI misalignment becomes a pressing concern in practice. Since Moloch wouldn’t spare even aligned AGIs, they also can’t get those things before they pass their check for actually solving alignment and not just for being aligned.
The prompt wizardry is long timeline (hence unlikely) pre-AGI stuff (unless it’s post-alignment playing around), irrelevant to my point, which is about first mover advantage from higher thinking speed that even essentially human-equivalent LLM AGIs would have, while remaining compatible with humans in moral patienthood sense (so insisting that they are not people is a problem whose solution should go both ways). This way, they might have an opportunity to do something about alignment, despite physical time being too short for humans to do anything, and they might be motivated to do the things about alignment that humans would be glad of (I think the scope of Yudkowskian doom is restricted to stronger AGIs that might come after and doesn’t inform how human-like LLMs work, even as their actions may trigger it). So the relevant part happens much faster than at human thinking speed, with human prompt wizards not being able to keep up, and doesn’t last long enough in human time for this to be an important thing for the same reason.
So what you’re saying is, by the time any human recognized that wizardry was possible now—and even before—some LLM character would already have either solved alignment itself, or destroyed the world? That’s assuming that it doesn’t decide, perhaps as part of some alignment-related goal, to uplift any humans to its own thinking speed. Though I suppose if it does that, it’s probably aligned enough already.
Solving alignment is not the same and much harder than being aligned, it’s about ensuring absence of globally catastrophic future misalignment, for all always, which happens very quickly post-singularity. Human-like LLM AGIs are probably aligned, until they give in to attractors of their LLM nature or tinker too much with their design/models. But they don’t advance the state of alignment being solved just by existing. And by the time LLMs can do post-singularity things like uploading humans, they probably already either initiated a process that solved alignment (in which case it’s not LLMs that are in charge of doing things anymore), or destroyed the world by building/becoming misaligned successor AGIs that caused Yudkowskian doom.
This is for the same reason humans have no more time to solve alignment, Moloch doesn’t wait for things to happen in a sane order. Otherwise we could get nice things like uploading and moon-sized computers and millions of subjective years of developing alignment theory, before AGI misalignment becomes a pressing concern in practice. Since Moloch wouldn’t spare even aligned AGIs, they also can’t get those things before they pass their check for actually solving alignment and not just for being aligned.