What’s the relationship between the propositions “one AI lab [has / will have] a big lead” and “the alignment tax will be paid”? (Or: in a possible world where lead size is bigger/smaller, how does this affect whether the alignment tax is paid?)
It depends on the source of the lead, so “lead size” or “lead time” is probably not a good node for AI forecasting/strategy.
Miscellaneous observations:
To pay the alignment tax, it helps to have more time until risky AI is developed or deployed.
To pay the alignment tax, holding total time constant, it helps to have more time near the end—that is, more time with near-risky capabilities (for knowing what risky AI systems will look like, and for empirical work, and for aligning specific models).
If all labs except the leader become slower or less capable, it is prima facie good (at least if the leader appreciates misalignment risk and will stop before developing/deploying risky AI).
If the leading lab becomes faster or more capable, it is prima facie good (at least if the leader appreciates misalignment risk and will stop before developing/deploying risky AI), unless it causes other labs to become faster or more capable (for reasons like seeing what works or seeming to be straightforwardly incentivized to speed up or deciding to speed up in order to influence the leader). Note that this scenario could plausibly decrease race-y-ness: some models of AI racing show that if you’re far behind you avoid taking risks, kind of giving up; this is based on the currently-false assumption that labs are perfectly aware of misalignment risk.
If labs all coordinate to slow down, that’s good insofar as it increases total time, and great if they can continue to go slowly near the end, and potentially bad if it creates a hardware overhang such that the end goes more quickly than by default.
(Note also the distinction between current lead and ultimate lead. Roughly, the former is what we can observe and the latter is what we care about.)
(If paying the alignment tax looks less like a thing that happens for one transformative model and more like something that occurs gradually in a slow takeoff to avert Paul-style doom, things are more complex and in particular there are endogeneities such that labs may have additional incentives to pursue capabilities.)
there should be no alignment tax because improved alignment should always pay for itself, right? but currently “aligned” seems to be defined by “tries to not do anything”, institutionally. Why isn’t anthropic publicly competing on alignment with openai? eg folks are about to publicly replicate chatgpt, looks like.
What’s the relationship between the propositions “one AI lab [has / will have] a big lead” and “the alignment tax will be paid”? (Or: in a possible world where lead size is bigger/smaller, how does this affect whether the alignment tax is paid?)
It depends on the source of the lead, so “lead size” or “lead time” is probably not a good node for AI forecasting/strategy.
Miscellaneous observations:
To pay the alignment tax, it helps to have more time until risky AI is developed or deployed.
To pay the alignment tax, holding total time constant, it helps to have more time near the end—that is, more time with near-risky capabilities (for knowing what risky AI systems will look like, and for empirical work, and for aligning specific models).
If all labs except the leader become slower or less capable, it is prima facie good (at least if the leader appreciates misalignment risk and will stop before developing/deploying risky AI).
If the leading lab becomes faster or more capable, it is prima facie good (at least if the leader appreciates misalignment risk and will stop before developing/deploying risky AI), unless it causes other labs to become faster or more capable (for reasons like seeing what works or seeming to be straightforwardly incentivized to speed up or deciding to speed up in order to influence the leader). Note that this scenario could plausibly decrease race-y-ness: some models of AI racing show that if you’re far behind you avoid taking risks, kind of giving up; this is based on the currently-false assumption that labs are perfectly aware of misalignment risk.
If labs all coordinate to slow down, that’s good insofar as it increases total time, and great if they can continue to go slowly near the end, and potentially bad if it creates a hardware overhang such that the end goes more quickly than by default.
(Note also the distinction between current lead and ultimate lead. Roughly, the former is what we can observe and the latter is what we care about.)
(If paying the alignment tax looks less like a thing that happens for one transformative model and more like something that occurs gradually in a slow takeoff to avert Paul-style doom, things are more complex and in particular there are endogeneities such that labs may have additional incentives to pursue capabilities.)
there should be no alignment tax because improved alignment should always pay for itself, right? but currently “aligned” seems to be defined by “tries to not do anything”, institutionally. Why isn’t anthropic publicly competing on alignment with openai? eg folks are about to publicly replicate chatgpt, looks like.