RussellThor comments on Anthropic’s Core Views on AI Safety

RussellThor 11 Mar 2023 5:46 UTC
1 point
0
Good to see Anthropic’s serious and seem better then OpenAI.
A few general questions that don’t seem to be addressed:
1. There is a belief that AI is more dangerous the more different it is from us. Isn’t this a general reason to build it as like us as possible? For example isn’t mind uploading/Whole Brain Emulation a better approach if possible? If its obviously too slow, then could we make the AI at least follow our evolutionary trajectory as much as possible?
2. There is justified concern about behavior changing a lot when the system becomes situationally aware/self aware. It doesn’t seem to be discussed at all whether to delay or cause this to happen sooner. Wouldn’t it be worthwhile to make the AI as self aware as possible when it is still < human AGI so we can see the changes as this happens? It seems it will happen unpredictably otherwise which is hardly good.
I have some more detailed comments/questions but I want to be sure there aren’t obvious answers to these first.
- Zac Hatfield-Dodds 12 Mar 2023 17:21 UTC
  6 points
  5
  Parent
  1. “AI is more dangerous the more different it is from us” seems wrong to me: it is very different and likely to be very dangerous, but that doesn’t imply that making it somewhat more like us would make it less dangerous. I don’t think brain emulation can be developed in time, replaying evolution seems unhelpful to me, and both seem likely to cause enormous suffering (aka mindcrime).
  2. See my colleague Ethan Perez’s comment here on upcoming research, including studying situational awareness as a risk factor for deceptive misalignment.
  - RussellThor 16 Mar 2023 5:35 UTC
    1 point
    0
    Parent
    Thanks. OK I will put some more general thoughts, have to go back a few steps.
    To me the more general alignment problem is AI gives humanity ~10,000 years of progress and probably irreversible change in ~1-10 years. To me the issue is how do you raise humans intelligence from that given by biology to that given by the limits of physics in a way that is identify preserving as much as possible. Building AI seems to be the worst way to do that. If I had a fantasy way it would be say increase everyone’s IQ by 10 points per year for 100+ years until we reach the limit.
    We can’t do that but that is why I mentioned WBE, my desire would be to stop AGI, get human mind uploading to work, then let those WBE raise their IQ in parallel. Their agreed upon values would be humanities values by definition then.
    If our goal is Coherent Extrapolated Volition or something similar for humanity then how can we achieve that if we don’t increase the IQ of humans (or descendants they identify with)? How can we even know what our own desires/values are at increasing IQ’s if we don’t directly experience them.
    I have an opinion what successful alignment looks like to me but is it very different for other people? We can all agree what bad is.