But every environment which isn’t perfectly known and every “goal” which isn’t complete concrete , opens up error. Which then stacka upon error as any “plan” to interact with / modify reality adds another step.
If the ASI can infer some materials science breakthroughs with given human knowledge and existing experimental data to some great degree of certainty , ok I buy it.
What I don’t buy is that it can simulate enough actions and reactions with enough certainty to nail a large domain of things on the first try.
But I suppose thats still sort of moot from an existential risk perspective because FOOM and sharp turns aren’t really a requirement.
But “inferring” the best move in tic tac toe and say “developing a unified theory of reality without access to super colliders” is a stretch that doesn’t hold up to reason.
“Hands on experience ia not magic” , neither is “superintelligence” , the LLM’s already hallucinate and any concievable future iteration will still be bound by physics , a few wrong assumptions compounded together can whiff a lot of hyperintelligent schemes.
Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn’t.
Or the sociap effects (deep fakes for example) cpuld ruin the world or land us in a dystopia well before actual AGI.
But that might be a bit orthagonal and in the weeds (specific examples of how we end up with x-risk or s-risk end scenarios without the attributing magic powers to the ASI)
Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn’t.
I think that scenario falls under the “worlds where iterative approaches fail” bucket, at least if prior to that we had a bunch of examples of AGIs that seemed and were value aligned and controllable, and the misalignment only showed up in the superhuman domain.
There is a different failure mode, which is “we see a bunch of cases of deceptive alignment in sub-human-capability AIs causing minor to moderate disasters, and we keep scaling up despite those disasters”. But that’s not so much “iterative approaches cannot work” as “iterative approaches do not work if you don’t learn from your mistakes”.
But every environment which isn’t perfectly known and every “goal” which isn’t complete concrete , opens up error. Which then stacka upon error as any “plan” to interact with / modify reality adds another step.
If the ASI can infer some materials science breakthroughs with given human knowledge and existing experimental data to some great degree of certainty , ok I buy it.
What I don’t buy is that it can simulate enough actions and reactions with enough certainty to nail a large domain of things on the first try.
But I suppose thats still sort of moot from an existential risk perspective because FOOM and sharp turns aren’t really a requirement.
But “inferring” the best move in tic tac toe and say “developing a unified theory of reality without access to super colliders” is a stretch that doesn’t hold up to reason.
“Hands on experience ia not magic” , neither is “superintelligence” , the LLM’s already hallucinate and any concievable future iteration will still be bound by physics , a few wrong assumptions compounded together can whiff a lot of hyperintelligent schemes.
It’s not a moot point, because a lot of the difficulty of the problem as stated here is the “iterative approaches cannot work” bit.
Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn’t.
Or the sociap effects (deep fakes for example) cpuld ruin the world or land us in a dystopia well before actual AGI.
But that might be a bit orthagonal and in the weeds (specific examples of how we end up with x-risk or s-risk end scenarios without the attributing magic powers to the ASI)
I think that scenario falls under the “worlds where iterative approaches fail” bucket, at least if prior to that we had a bunch of examples of AGIs that seemed and were value aligned and controllable, and the misalignment only showed up in the superhuman domain.
There is a different failure mode, which is “we see a bunch of cases of deceptive alignment in sub-human-capability AIs causing minor to moderate disasters, and we keep scaling up despite those disasters”. But that’s not so much “iterative approaches cannot work” as “iterative approaches do not work if you don’t learn from your mistakes”.