Justin Bullock comments on Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS)

Justin Bullock 26 May 2021 0:03 UTC
4 points
0
Thank you for the insights. I agree with your insight that “bureaucracies are notorious homes to Goodhart effects and they have as yet found no way to totally control them.” I also agree with you intuition that “to be fair bureaucracies do manage to achieve a limited level of alignment, and they can use various mechanisms that generate more vs. less alignment.”
I do however believe that an ideal type of bureaucratic structure helps with at least some forms of the alignment problem. If for example, Drexler is right, and my conceptualization of the theory is right (CAIS) expects a slow takeoff of increasing intelligent narrow AIs that work together on different components of intelligence or completing intelligent tasks. In this case, I think Weber’s suggestions both of how to create generally controllable intelligent agents (Beamte) and his ideas on constraining individual agents authority to certain tasks who are then nominated to higher tasks by those with more authority (weight, success, tenure, etc) has something helpful to say in the design of narrow agents that might work together towards a common goal.
My thoughts here are still in progress and I’m planning to spend time with these two recent posts in particular to help my understanding:
https://www.lesswrong.com/posts/Fji2nHBaB6SjdSscr/safer-sandboxing-via-collective-separation
https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models
One final thing I would add is that I think many of problems with bureaucracies can often be characterized around limits of information and communication (and how agents are trained and how they are motivated and what are the most practical or useful levels of hierarchy or discretion). I think the growth of increasingly intelligent narrow AIs could (under the right circumstance) drastically limit information and communication problems.
Thanks again for your comment. The feedback is helpful. I hope to make additional posts in the near future to try and further develop these ideas.
- Gordon Seidoh Worley 26 May 2021 14:52 UTC
  3 points
  0
  Parent
  Yeah, I guess I should say that I’m often worried about the big problem of superintelligent AI and not much thinking about how to control narrow and not generally capable AI. For weak AI, this kind of prosaic control mechanism might be reasonable. Christiano things this class of methods might work on stronger AI.
  - Justin Bullock 26 May 2021 16:27 UTC
    1 point
    0
    Parent
    I think this approach may have something to add to Christiano’s method, but I need to give it more thought.
    I don’t think it is yet clear how this structure could help with the big problem of superintelligent AI. The only contributions I see clearly enough at this point are redundant to arguments made elsewhere. For example, the notion of a “machine beamte” as one that can be controlled through (1) the appropriate training and certification, (2) various motivations and incentives for aligning behavior with the knowledge from training, and (3) nominated by a higher authority for more influence. These are not novel considerations of course, but I think they do very much point to the same types of concerns of how to control agent behavior in an aligned way when the individual intelligent agents may have some components that are not completely aligned with the goal function of the principal (organization in this context, keeping superintelligent AI controlled by humanity as another potential context).
    Thanks for the follow up.