Isn’t “misaligned AI” by definition a bad thing and “ASI-boosted humans” by definition a good thing? You’re basically asking “How likely is <good outcome> given that we have <a machine that creates good outcomes>”
ASI-boosted humans — We solve all of the problems involved in aiming artificial superintelligence at the things we’d ideally want.
[...]
misaligned AI — Humans build and deploy superintelligent AI that isn’t aligned with what we’d ideally want.
I’d expect most people to agree that “We solve all of the problems involved in aiming artificial superintelligence at the things we’d ideally want” yields outcomes that are about as good as possible, and I’d expect most of the disagreement to turn (either overtly or in some subtle way) on differences in how we’re defining relevant words (like “ideally”, “good”, and “problems”).
I’d be fine with skipping over this question, except that some of the differences-in-definition might be important for the other questions, so this question may be useful for establishing a baseline.
With “misaligned AI”, there are some definitional issues but I expect most of the disagreement to be substantive, since there are a lot of different levels of Badness you could expect even if you want to call all misaligned AI “bad” (at least relative to ASI-boosted humans).
In my own answers, I interpreted “misaligned AGI” as meaning: We weren’t good enough at alignment to make the AGI do exactly what we wanted, so it permanently took control of the future and did “something that isn’t exactly what we wanted” instead. (Which might be kinda similar to what we wanted, or might be wildly different, etc.)
If an alien only cared about maximizing the amount of computronium in the universe, and it built an AI that fills the universe with computronium because the AI values calculating pi, then I think I’d say that the AI is “aligned with that alien by default / by accident”, rather than saying “the AI is misaligned with that alien but is doing ~exactly what we want anyway”. So if someone thinks AI does exactly what humans want even with humans putting in zero effort to steer the AI toward that outcome, I’d classify that as “aligned-by-default AI”, rather than “misaligned AI”. (But there’s still a huge range of possible-in-principle outcomes from misaligned AI, even if I think some a lot more likely than others.)
All the ASI-boosted humans one feel a bit tricky for me to answer, because it seems possible that we get strong aligned AI, in a distributed takeoff, but that we deploy it unwisely. Namely that world immediately collapses into Moloch, whereby everyone follows their myopic incentives off a cliff.
That cuts my odds of good outcomes by a factor of two or so.
Predictions, using the definitions in Nate’s post:
Isn’t “misaligned AI” by definition a bad thing and “ASI-boosted humans” by definition a good thing? You’re basically asking “How likely is <good outcome> given that we have <a machine that creates good outcomes>”
The definitions given in the post are:
I’d expect most people to agree that “We solve all of the problems involved in aiming artificial superintelligence at the things we’d ideally want” yields outcomes that are about as good as possible, and I’d expect most of the disagreement to turn (either overtly or in some subtle way) on differences in how we’re defining relevant words (like “ideally”, “good”, and “problems”).
I’d be fine with skipping over this question, except that some of the differences-in-definition might be important for the other questions, so this question may be useful for establishing a baseline.
With “misaligned AI”, there are some definitional issues but I expect most of the disagreement to be substantive, since there are a lot of different levels of Badness you could expect even if you want to call all misaligned AI “bad” (at least relative to ASI-boosted humans).
In my own answers, I interpreted “misaligned AGI” as meaning: We weren’t good enough at alignment to make the AGI do exactly what we wanted, so it permanently took control of the future and did “something that isn’t exactly what we wanted” instead. (Which might be kinda similar to what we wanted, or might be wildly different, etc.)
If an alien only cared about maximizing the amount of computronium in the universe, and it built an AI that fills the universe with computronium because the AI values calculating pi, then I think I’d say that the AI is “aligned with that alien by default / by accident”, rather than saying “the AI is misaligned with that alien but is doing ~exactly what we want anyway”. So if someone thinks AI does exactly what humans want even with humans putting in zero effort to steer the AI toward that outcome, I’d classify that as “aligned-by-default AI”, rather than “misaligned AI”. (But there’s still a huge range of possible-in-principle outcomes from misaligned AI, even if I think some a lot more likely than others.)
All the ASI-boosted humans one feel a bit tricky for me to answer, because it seems possible that we get strong aligned AI, in a distributed takeoff, but that we deploy it unwisely. Namely that world immediately collapses into Moloch, whereby everyone follows their myopic incentives off a cliff.
That cuts my odds of good outcomes by a factor of two or so.
I don’t think my responses to this are correct unless normalized to sum to 1. this might be better on manifold.