I see the largest challenge here that it doesn’t deal with the fundamental reasons why AI misalignment is a concern. Bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them. Classic examples being things like managers optimizing for their own promotion or for the performance of their department to the determinant of the larger organization and whatever its goals are.
Now to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment (examples of things that help: OKRs, KPIs, mission and value statements), and they do sometimes manage to rein in what would otherwise be threats to political stability (ambitious people direct their ambition to moving up in the bureaucracy rather than taking over a country), but they also sometimes fail at this when you get smart actors who know how to game the system.
I’d argue that Robert Moses is a great example of this. He successfully moved up the bureaucracy in New York to, yes, get stuff done, but also did it in ways that many people would consider “unaligned with human values” and also achieved outcomes of mixed value (good for some people (affluent whites), bad for others (everyone else)). What we can say is he did stuff that was impressive that got and kept him in power.
Actually, I think this generalizes: if you want to design an alignment mechanism, you should expect that it could at least force Robert Moses to be aligned, a man who expertly gamed the system to achieve this goals the same way we expect transformative AI to be able to do.
Thank you for the insights. I agree with your insight that “bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them.” I also agree with you intuition that “to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment.”
I do however believe that an ideal type of bureaucratic structure helps with at least some forms of the alignment problem. If for example, Drexler is right, and my conceptualization of the theory is right (CAIS) expects a slow takeoff of increasing intelligent narrow AIs that work together on different components of intelligence or completing intelligent tasks. In this case, I think Weber’s suggestions both of how to create generally controllable intelligent agents (Beamte) and his ideas on constraining individual agents authority to certain tasks who are then nominated to higher tasks by those with more authority (weight, success, tenure, etc) has something helpful to say in the design of narrow agents that might work together towards a common goal.
My thoughts here are still in progress and I’m planning to spend time with these two recent posts in particular to help my understanding:
One final thing I would add is that I think many of problems with bureaucracies can often be characterized around limits of information and communication (and how agents are trained and how they are motivated and what are the most practical or useful levels of hierarchy or discretion). I think the growth of increasingly intelligent narrow AIs could (under the right circumstance) drastically limit information and communication problems.
Thanks again for your comment. The feedback is helpful. I hope to make additional posts in the near future to try and further develop these ideas.
Yeah, I guess I should say that I’m often worried about the big problem of superintelligent AI and not much thinking about how to control narrow and not generally capable AI. For weak AI, this kind of prosaic control mechanism might be reasonable. Christiano things this class of methods might work on stronger AI.
I think this approach may have something to add to Christiano’s method, but I need to give it more thought.
I don’t think it is yet clear how this structure could help with the big problem of superintelligent AI. The only contributions I see clearly enough at this point are redundant to arguments made elsewhere. For example, the notion of a “machine beamte” as one that can be controlled through (1) the appropriate training and certification, (2) various motivations and incentives for aligning behavior with the knowledge from training, and (3) nominated by a higher authority for more influence. These are not novel considerations of course, but I think they do very much point to the same types of concerns of how to control agent behavior in an aligned way when the individual intelligent agents may have some components that are not completely aligned with the goal function of the principal (organization in this context, keeping superintelligent AI controlled by humanity as another potential context).
I see the largest challenge here that it doesn’t deal with the fundamental reasons why AI misalignment is a concern. Bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them. Classic examples being things like managers optimizing for their own promotion or for the performance of their department to the determinant of the larger organization and whatever its goals are.
Now to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment (examples of things that help: OKRs, KPIs, mission and value statements), and they do sometimes manage to rein in what would otherwise be threats to political stability (ambitious people direct their ambition to moving up in the bureaucracy rather than taking over a country), but they also sometimes fail at this when you get smart actors who know how to game the system.
I’d argue that Robert Moses is a great example of this. He successfully moved up the bureaucracy in New York to, yes, get stuff done, but also did it in ways that many people would consider “unaligned with human values” and also achieved outcomes of mixed value (good for some people (affluent whites), bad for others (everyone else)). What we can say is he did stuff that was impressive that got and kept him in power.
Actually, I think this generalizes: if you want to design an alignment mechanism, you should expect that it could at least force Robert Moses to be aligned, a man who expertly gamed the system to achieve this goals the same way we expect transformative AI to be able to do.
Thank you for the insights. I agree with your insight that “bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them.” I also agree with you intuition that “to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment.”
I do however believe that an ideal type of bureaucratic structure helps with at least some forms of the alignment problem. If for example, Drexler is right, and my conceptualization of the theory is right (CAIS) expects a slow takeoff of increasing intelligent narrow AIs that work together on different components of intelligence or completing intelligent tasks. In this case, I think Weber’s suggestions both of how to create generally controllable intelligent agents (Beamte) and his ideas on constraining individual agents authority to certain tasks who are then nominated to higher tasks by those with more authority (weight, success, tenure, etc) has something helpful to say in the design of narrow agents that might work together towards a common goal.
My thoughts here are still in progress and I’m planning to spend time with these two recent posts in particular to help my understanding:
https://www.lesswrong.com/posts/Fji2nHBaB6SjdSscr/safer-sandboxing-via-collective-separation
https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models
One final thing I would add is that I think many of problems with bureaucracies can often be characterized around limits of information and communication (and how agents are trained and how they are motivated and what are the most practical or useful levels of hierarchy or discretion). I think the growth of increasingly intelligent narrow AIs could (under the right circumstance) drastically limit information and communication problems.
Thanks again for your comment. The feedback is helpful. I hope to make additional posts in the near future to try and further develop these ideas.
Yeah, I guess I should say that I’m often worried about the big problem of superintelligent AI and not much thinking about how to control narrow and not generally capable AI. For weak AI, this kind of prosaic control mechanism might be reasonable. Christiano things this class of methods might work on stronger AI.
I think this approach may have something to add to Christiano’s method, but I need to give it more thought.
I don’t think it is yet clear how this structure could help with the big problem of superintelligent AI. The only contributions I see clearly enough at this point are redundant to arguments made elsewhere. For example, the notion of a “machine beamte” as one that can be controlled through (1) the appropriate training and certification, (2) various motivations and incentives for aligning behavior with the knowledge from training, and (3) nominated by a higher authority for more influence. These are not novel considerations of course, but I think they do very much point to the same types of concerns of how to control agent behavior in an aligned way when the individual intelligent agents may have some components that are not completely aligned with the goal function of the principal (organization in this context, keeping superintelligent AI controlled by humanity as another potential context).
Thanks for the follow up.