corrigible-commenter comments on Metaphilosophical competence can’t be disentangled from alignment

corrigible-commenter 3 Apr 2018 16:15 UTC
2 points
I think decision making can have an impact on values, but that this depends on the design of the agent. In my comment, by values, I had in mind something like “the thing that the agent is maximizing”. We can imagine an agent like the paperclip maximizer for which the “decision making” ability of the agent doesn’t change the agent’s values. Is this agent in an “epistemic pit”? I think the agent is in a “pit” from our perspective, but it’s not clear that the pit is epistemic. One could model the paperclip maximizer as an agent whose epistemology is fine but that simply values different things than we do. In the same way, I think people could be worried about amplifying humans because they are worried that those humans will get stuck in non-epistemic pits rather than epistemic pits. For example, I think the discussion in other comments related to slavery partly have to due with this issues.

The extent to which a human’s “values” will improve as a result of improvements in “decision making” to me seems to depend on having a psychological model of humans, which won’t necessarily be a good model of an AI. As a result, people may agree/disagree in terms of their intuitions about amplification as applied to humans due to similarities/differences in their psychological models of those humans, while their agreement/disagreement on amplification as applied to AI may result from different factors. In this sense, I’m not sure that different intuitions about amplifying humans is necessarily a crux for differences in terms of amplifying AIs.

In general I expect amplification to improve decision-making processes substantially, but in most cases to not improve them enough.

To me, this seems like a good candidate for something close to a crux between “optimistic” alignment strategies (similar to amplification/distillation) vs. “pessimistic” alignment strategies (similar to agent foundations). I see it like this. The “optimistic” approach is more optimistic that certain aspects of metaphilosophical competence can be learned during the learning process or else addressed by fairly simple adjustment to the design of the learning process, whereas the “pessimistic” approach is pessimistic that solutions to certain issues can be learned and so thinks that we need to do a lot of hard work to get solutions to these problems that we understand on a deep level before we can align an agent. I’m not sure which is correct, but I do think this is a critical difference in the rationales underlying these approaches.