One hurdle for this plan is to incentivize developers to slap on 20 layers of alignment strategies to their generalist AI models. It may be a hard sell when they are trying to maximize power and efficiency to stay competitive.
You’ll probably need to convince them that not having such safeguards in place will lead to undesirable behavior (i.e., unprofitable behavior, or behavior leading to bad PR or bad customer reviews) well below the level of apocalyptic scenarios that AI safety advocates normally talk about. Otherwise, they may not care.
My mainline success-model looks like this: the key actors know alignment is hard and coordinate to solve it. I’m focusing on this success-model until another success-model becomes mainline.
I’m more bullish on coordination between the key actors than a lot of the TAIS community.
I think that the leading alignment-sympathetic actors slowed by the alignment tax will still outpace the alignment-careless actors.
The assemblage might be cheaper to run than the primitives alone.
Companies routinely use cryptographic assemblages.
Companies routinely use cryptographic assemblages.
This is IMO an insufficiently emphasised advantage. Modular alignment primitives are probably going to be easier to incorporate into extant systems. This will probably boost adoption compared to some bespoke solutions (or approaches that require designing systems for safety from scratch).
If there’s a convenient/compelling alignment as a service offering, then even organisations not willing to pay the time/development cost of alignment may adopt alignment offerings (no one genuinely wants to build misaligned systems).
I.e. if we minimise/eliminate the alignment tax, organisations would be much more willing to pay it, and modular assemblages seem like a compelling platform for such an offering.
If successful, this research agenda could “solve” the coordination problems.
This just might work. For a little while, anyway.
One hurdle for this plan is to incentivize developers to slap on 20 layers of alignment strategies to their generalist AI models. It may be a hard sell when they are trying to maximize power and efficiency to stay competitive.
You’ll probably need to convince them that not having such safeguards in place will lead to undesirable behavior (i.e., unprofitable behavior, or behavior leading to bad PR or bad customer reviews) well below the level of apocalyptic scenarios that AI safety advocates normally talk about. Otherwise, they may not care.
My mainline success-model looks like this: the key actors know alignment is hard and coordinate to solve it. I’m focusing on this success-model until another success-model becomes mainline.
I’m more bullish on coordination between the key actors than a lot of the TAIS community.
I think that the leading alignment-sympathetic actors slowed by the alignment tax will still outpace the alignment-careless actors.
The assemblage might be cheaper to run than the primitives alone.
Companies routinely use cryptographic assemblages.
This is IMO an insufficiently emphasised advantage. Modular alignment primitives are probably going to be easier to incorporate into extant systems. This will probably boost adoption compared to some bespoke solutions (or approaches that require designing systems for safety from scratch).
If there’s a convenient/compelling alignment as a service offering, then even organisations not willing to pay the time/development cost of alignment may adopt alignment offerings (no one genuinely wants to build misaligned systems).
I.e. if we minimise/eliminate the alignment tax, organisations would be much more willing to pay it, and modular assemblages seem like a compelling platform for such an offering.
If successful, this research agenda could “solve” the coordination problems.