It turns out that the alignment problem has some known solutions in the human case. First, there is an interesting special case namely where there are no decisions (or only a limited number of fully accounted for decisions) for the intelligent agent to be made—basically throwing all decision-making capabilities out of the window and only using object recognition and motion control (to use technical terms). With such an agent (we might call it zero-decision agent or zero-agent) scientific methods could be applied on all details of the work process and high efficiency reached: Scientific Management (also known as Taylorism). Obviously the unions hated it and it was later outlawed. I think something might be learned from this approach for AI control: Maybe we can build on top of a known solution for a zero-agent and prove that certain known decision classes are also safe.
Maybe other insights from management theory—which after all is about aligning intelligent agents—could also transfer. The alignment problem is called Principal-Agent Problem in the literature and there are quite a few Solutions to Principal-Agent Problems in Firms (Gary Miller 2005). The approaches should sound familiar: Solutions Based on
Obviously the unions hated it and it was later outlawed.
I wonder how could one outlaw a thing like this. Suppose that most managers believe that Taylorism works, but it is illegal to use it (under that name). Wouldn’t they simply reintroduce the practices, step by step, under a different name? I mean, if you use a different name, different keywords, different rationalization, and introduce it in small steps, it’s no longer the same thing, right? It just becomes “industry standards”. (If there happens to be an exact definition, of course, this only becomes an exercise how close to the forbidden thing you can legally get.)
From the Wikipedia article, I got the impression that what was made illegal was the use of stop-watch. Okay, so instead of measuring how many seconds you need to make a widget, I am going to measure how many widgets you make each day—that is legal, right? The main difference is that you can take a break, assuming it will allow you to work faster afterwards. Which may be quite an important difference. It this what it is about?
I think there’s something here, but it’s usually thought of the other way around, i.e. solving AI alignment implies solving human alignment, but the opposite is not necessarily true because humans are less general intelligences than AI.
Also, consider that your example of Taylorism is a case study in an alignment mechanism failing, in that it tried to align the org but failed in that it spawned the creation of a subagent (the union) that caused it to do something management might have considered worse than the loss of potential gains given up by not applying Taylorism.
Anyway, this is a topic that’s come up a few times on LessWrong; I don’t have links handy though but you should be able to find them via search.
I’m not trying to prove full alignment from these. It is more like a) a case study at actual efforts to align intelligent agents by formal means and b) the identification of conditions where this does succeed.
Regarding its failure: It seems that a close reading of its history doesn’t prove that: a) Taylorism didn’t fail within the factories and b) the unions were not founded within these factories (by their workers) but existed before and pursued their own agendas. Clearly real humans have a life outside of factories and can use that to coordinate—something that wouldn’t hold for a zero-agent AI.
I tried to find examples on LW and elsewhere. That is what turned up the link at the bottom. I am on LW for quite a while and have not seen this discussed in this way. I have searched again and all searches involving combinations of human intelligence, alignment and misc words for analogy or comparison turn up not much than this one which matches just because of its size:
Thank you for your detailed reply. I was already wondering whether anybody saw these shortform posts at all. They were promoted at a time but currently it seems hard to notice them with the current UI. How did you spot this post?
I read LW via /allPosts and they show up there for me. Not sure if that’s the default or not since you can configure the feed, which I’m sure I’ve done some of but I can’t remember what.
It turns out that the alignment problem has some known solutions in the human case. First, there is an interesting special case namely where there are no decisions (or only a limited number of fully accounted for decisions) for the intelligent agent to be made—basically throwing all decision-making capabilities out of the window and only using object recognition and motion control (to use technical terms). With such an agent (we might call it zero-decision agent or zero-agent) scientific methods could be applied on all details of the work process and high efficiency reached: Scientific Management (also known as Taylorism). Obviously the unions hated it and it was later outlawed. I think something might be learned from this approach for AI control: Maybe we can build on top of a known solution for a zero-agent and prove that certain known decision classes are also safe.
Maybe other insights from management theory—which after all is about aligning intelligent agents—could also transfer. The alignment problem is called Principal-Agent Problem in the literature and there are quite a few Solutions to Principal-Agent Problems in Firms (Gary Miller 2005). The approaches should sound familiar: Solutions Based on
Incentives Linked io Agent Outcomes,
Direct Monitoring of Agent Actions,
Cooperation Between Principal and Agent, and
Cooperation within Teams
Tangentially related on LessWrong: The AI Alignment Problem has already been solved once
I wonder how could one outlaw a thing like this. Suppose that most managers believe that Taylorism works, but it is illegal to use it (under that name). Wouldn’t they simply reintroduce the practices, step by step, under a different name? I mean, if you use a different name, different keywords, different rationalization, and introduce it in small steps, it’s no longer the same thing, right? It just becomes “industry standards”. (If there happens to be an exact definition, of course, this only becomes an exercise how close to the forbidden thing you can legally get.)
From the Wikipedia article, I got the impression that what was made illegal was the use of stop-watch. Okay, so instead of measuring how many seconds you need to make a widget, I am going to measure how many widgets you make each day—that is legal, right? The main difference is that you can take a break, assuming it will allow you to work faster afterwards. Which may be quite an important difference. It this what it is about?
I assume that that’s what happened. Some ideas from scientific management were taken and applied in less extreme ways.
I think there’s something here, but it’s usually thought of the other way around, i.e. solving AI alignment implies solving human alignment, but the opposite is not necessarily true because humans are less general intelligences than AI.
Also, consider that your example of Taylorism is a case study in an alignment mechanism failing, in that it tried to align the org but failed in that it spawned the creation of a subagent (the union) that caused it to do something management might have considered worse than the loss of potential gains given up by not applying Taylorism.
Anyway, this is a topic that’s come up a few times on LessWrong; I don’t have links handy though but you should be able to find them via search.
I’m not trying to prove full alignment from these. It is more like a) a case study at actual efforts to align intelligent agents by formal means and b) the identification of conditions where this does succeed.
Regarding its failure: It seems that a close reading of its history doesn’t prove that: a) Taylorism didn’t fail within the factories and b) the unions were not founded within these factories (by their workers) but existed before and pursued their own agendas. Clearly real humans have a life outside of factories and can use that to coordinate—something that wouldn’t hold for a zero-agent AI.
I tried to find examples on LW and elsewhere. That is what turned up the link at the bottom. I am on LW for quite a while and have not seen this discussed in this way. I have searched again and all searches involving combinations of human intelligence, alignment and misc words for analogy or comparison turn up not much than this one which matches just because of its size:
https://www.lesswrong.com/posts/5bd75cc58225bf0670375575/the-learning-theoretic-ai-alignment-research-agenda
Can you suggest better ones?
Thank you for your detailed reply. I was already wondering whether anybody saw these shortform posts at all. They were promoted at a time but currently it seems hard to notice them with the current UI. How did you spot this post?
I read LW via /allPosts and they show up there for me. Not sure if that’s the default or not since you can configure the feed, which I’m sure I’ve done some of but I can’t remember what.
The /allPosts is pretty useful. Thank you!