Gunnar_Zarncke comments on Box inversion hypothesis

Gunnar_Zarncke Oct 20, 2020, 6:43 PM
6 points
Maybe roles—or something like that—are the connecting element.
Disclaimer: I’m not too familiar with either AF or CAIS being just an LW regular.
I have been thinking about the unsolved principal-agent-problem (PAP) for quite a while. Both for theoretical reasons as a solution to the AI alignment problem as well as practically as I work as a CTO of a growing company and we have a growing number of agents that need alignment ;-)
It appears that companies have mostly found relatively reliable ways to solve the PAP in practice. Methods are taught and used by MBAs. There is no mathematical theory that explains PAP—it seems more like engineering to me. Social engineering if you want. In my own management, I want to apply evidence-based methods and I hoped to find clear proven methods. I read management advice with an eye on possible mathematical principles. I don’t claim I have found any but I am building an intuition of what it could be.
Key elements are roles and processes. You will hear that a lot that you need to have them. But what is that actually, a role? Or a process? Where does it come from? How is it established? I have established a few processes in our growing startup always wondering what I’m doing. Always trying to notice and make explicit what caused the change, noticing phase transitions in growth, how with a growing number of agents existing rules stop to work (or start to work or rather being efficient compared to the alternatives). A lot of why this works is based on common knowledge and creating it—or using it.
What does that mean for the box inversion? I tried to apply the intuitions I have built to the box inversion hypothesis. My proposal is that it could be something like roles. When an agent delegates something to a sub-agent (as in the AF) then “delegating” means expectation to conform to a role. While in the CAIS it is the other way around: A lot of participants find themselves in roles of the system and pushing against that.
Not sure any of this makes sense and for sure that is no hidden analogy to physics or something like that. Just my 2ct.
- Dagon Oct 21, 2020, 6:13 PM
  4 points
  Parent
  I would love to see (and contribute to, if you want to collaborate) a post on “what are roles and processes” in terms of human organizations, and how it might apply to agent alignment topics. I spend a lot of my time and energy at work (Principal Engineer at a very large company; somewhat similar to CTO of a 150-person division) in formalizing and encouraging people and teams to adopt processes and to understand the roles they need to embrace in order to have the (positive) impact we all want.
  There’s an interesting mix in this work—some of it is identifying goals we share and looking for ways to measure and improve at furthering them. But some of it is normalizing the goals themselves—not exactly “alignment”, but “finding and formalizing of mutually-beneficial utility trades”. These are visible, causal trades—nothing fancy except that they’re rarely encoded as actual written agreements—they’re informal beliefs within the employees’ heads, based on implicit relationships between teams or with customers.