IMO, the fusion power generator scenario is really not a misalignment story, but a story about democratization of very dangerous technology leading to a vulnerable world, as even conditioning on the alignment problem being solved, it’s still very possible for the world to end through misuse, and the problem you’re sketching out here is perhaps an interface problem, but I think it’s closer to a problem with misuse/vulnerable worlds than anything else, and thus the corresponding solutions will look different.
On the current day, the solutions would look like not open-sourcing or open-weighting the model, as well as strongly controlling the affordances for the released models, and longer term, probably figuring out ways to harden AI against fine-tuning for say, bio-weapons or fusion research.
IMO, the fusion power generator scenario is really not a misalignment story, but a story about democratization of very dangerous technology leading to a vulnerable world, as even conditioning on the alignment problem being solved, it’s still very possible for the world to end through misuse, and the problem you’re sketching out here is perhaps an interface problem, but I think it’s closer to a problem with misuse/vulnerable worlds than anything else, and thus the corresponding solutions will look different.
On the current day, the solutions would look like not open-sourcing or open-weighting the model, as well as strongly controlling the affordances for the released models, and longer term, probably figuring out ways to harden AI against fine-tuning for say, bio-weapons or fusion research.