Some thoughts (don’t have any background in stuff related but seemed interesting).
I think it would be interesting to see what you found if you looked into the state of existing research on AI coordination / delegation / systemic interactions and if any of it feels related. I’d be mildly surprised if people have studied exactly this but expect many relevant posts/papers.
In terms of related stuff on LessWrong, I can’t find it now but Paul Christiano has a post on worlds where things go badly slowly and I think this would be kinda in that genre.
I think this is an interesting thing to consider and feels somewhat related to Dan Hendrycks Natural “Selection Favors AIs over Humans” https://arxiv.org/abs/2303.16200. The connection in my head is “what does an AI ecosystem look like”, “what does it mean to discuss alignment in this context”, “what outcomes will this system tend towards” etc. The same way middle managers get selected for, so more generally AI systems with certain properties get selected for.
You might want to read about Ought’s agenda with supervise processes not outcomes which feels relevant.
Recursive middle manager hell feels somewhat related to inner misalignment / misaligned mesa-optimizers where instead of being a subset of the processing of an LLM (how I normally think about it but maybe not how others do), you have your AI system made of many layers and it’s plausible that intermediate layers end up optimizing proxies for inputs to what you care about and not even the thing itself. In this view, it seems like the misalignment of middle managers which usually makes companies less effective might just lead to selection against such systems as compared to systems with less of these properties.
There might be some strategically valuable research to be done here but it’s not super clear to me what the theory of change would be. Maybe there something to do with bandwidth / scalability tradeoffs that affect how tightly coupled vs diffuse/distributed useful/popular AI systems will be in the future.
Some thoughts (don’t have any background in stuff related but seemed interesting).
I think it would be interesting to see what you found if you looked into the state of existing research on AI coordination / delegation / systemic interactions and if any of it feels related. I’d be mildly surprised if people have studied exactly this but expect many relevant posts/papers.
In terms of related stuff on LessWrong, I can’t find it now but Paul Christiano has a post on worlds where things go badly slowly and I think this would be kinda in that genre.
I think this is an interesting thing to consider and feels somewhat related to Dan Hendrycks Natural “Selection Favors AIs over Humans” https://arxiv.org/abs/2303.16200. The connection in my head is “what does an AI ecosystem look like”, “what does it mean to discuss alignment in this context”, “what outcomes will this system tend towards” etc. The same way middle managers get selected for, so more generally AI systems with certain properties get selected for.
You might want to read about Ought’s agenda with supervise processes not outcomes which feels relevant.
Recursive middle manager hell feels somewhat related to inner misalignment / misaligned mesa-optimizers where instead of being a subset of the processing of an LLM (how I normally think about it but maybe not how others do), you have your AI system made of many layers and it’s plausible that intermediate layers end up optimizing proxies for inputs to what you care about and not even the thing itself. In this view, it seems like the misalignment of middle managers which usually makes companies less effective might just lead to selection against such systems as compared to systems with less of these properties.
There might be some strategically valuable research to be done here but it’s not super clear to me what the theory of change would be. Maybe there something to do with bandwidth / scalability tradeoffs that affect how tightly coupled vs diffuse/distributed useful/popular AI systems will be in the future.