paulfchristiano comments on The limits of corrigibility

paulfchristiano 16 May 2018 5:50 UTC
7 points
I guess my point is that there are open questions about how to protect against value drift caused by AI, what the AI should do when the user doesn’t have much idea of how they want their values to be pushed around, and how to get the AI to competently help the user with moral questions, which seem to be orthogonal to how to make the AI corrigible. I think you don’t necessarily disagree but just see these as lower priority problems than corrigibility? Without arguing about that, perhaps we can agree that listing these explicitly at least makes it clearer what problems corrigibility by itself can and can’t solve?
I agree with all of this. Yes, I see these other problems as (significantly) lower priority problems than alignment/corrigibility. But I do agree that it’s worth listing those problems explicitly.
My current guess is that the most serious non-alignment AI problems are:
1. AI will enable access to destructive physical technologies (without corresponding improvements in coordination).
2. AI will enable access to more AI, not covered by existing alignment techniques (without corresponding speedups in alignment).
These are both related to the more general problem: “Relative to humans, AI might be even better at tasks with rapid feedback relative to tasks without rapid feedback.” Moral/philosophical competence is also related to that general problem.
I typically list this more general problem prominently (as opposed to all of the other particular problems possibly posed by AI), because I think it’s especially important. (I may also be influenced by the fact that iterated amplification or debate also seem like a good approaches to this problem.)
This seems fine, as long as people who need to make strategic decisions about AI safety are aware of this, and whatever separate work that needs to be done is compatible with your basic approach.
I agree with this.
(I expect we disagree about practical recommendations, because we disagree about the magnitude of different problems.)
open questions about how to protect against value drift caused by AI
Do you see this problem as much different / more serious than value drift caused by other technology? (E.g. by changing how we interact with each other?)
- Wei Dai 17 May 2018 23:50 UTC
  3 points
  Parent
  
  I typically list this more general problem prominently (as opposed to all of the other particular problems possibly posed by AI), because I think it’s especially important.
  
  Have you written about this in a post or paper somewhere? (I’m thinking of writing a post about this and related topics and would like to read and build upon existing literature.)
  
  Do you see this problem as much different / more serious than value drift caused by other technology? (E.g. by changing how we interact with each other?)
  
  What other technology are you thinking of, that might have an effect comparable to AI? As far as how we interact with each other, it seems likely that once superintelligent AIs come into existence, all or most interactions between humans will be mediated through AIs, which surely will have a much greater effect than any other change in communications technology?
  - paulfchristiano 27 May 2018 3:43 UTC
    5 points
    Parent
    Have you written about this in a post or paper somewhere? (I’m thinking of writing a post about this and related topics and would like to read and build upon existing literature.)
    Not usefully. If I had to link to something on it, I might link to the Ought mission page, but I don’t have any substantive analysis to point to.
    As far as how we interact with each other, it seems likely that once superintelligent AIs come into existence, all or most interactions between humans will be mediated through AIs, which surely will have a much greater effect than any other change in communications technology?
    I agree with “larger effect than historical changes” but not “larger effect than all changes that we could speculate about” or even “larger effect sthan all changes between now and one superintelligent AIs come into existence.”
    If AI is aligned, then it’s also worth noting that this effect is large but not obviously unusually disruptive, since e.g. the AI is trying to think about how to minimize it (though it may be doing that imperfectly).
    As a random example, it seems plausible to me that changes to the way society is organized—what kinds of jobs people do, compulsory schooling, weaker connections to family, lower religiosity—over the last few centuries have had a larger unendorsed impact on values than AI will. I don’t see any principled reason to expect those changes to be positive while the changes from AI are negative, it seems like in expectation both of them would be positive but for the opportunity cost effect (where today we have the option to let our values and views change in whatever way we most endorse, and we foreclose this option when we let our values drift anything less than maximally-reflectively).