The others are mainly oral, with people coming up with plans that involve simulating humans for long periods of time, me doing the equivalent of saying “have you considered value drift” and (often) the reaction from the other revealing that no, they had not considered value drift.
Because the difference is large between what the setup will be in practice, and what current research is in practice.
What are the most important differences that you foresee?
The most important differences I foresee are the unforseen :-) I mean that seriously, because anything that is easy to foresee will possibly be patched before implementation.
But if we look at how research happens nowadays, it has a variety of different approaches and institutional cultures, certain levels of feedback both from within the AI safety community and the surrounding world, grounding our morality and keeping us connected to the flow of culture (such as it is).
Most of the simulation ideas do away with that. If someone suggested that the best idea for AI safety would be to lock up AI safety researchers in an isolated internet-free house for ten years and see what they came up with, we’d be all over the flaws in this plan (and not just the opportunity costs). But replace that physical, grounded idea with a similar one that involves “simulation”, and suddenly people flip into far mode and are more willing to accept it. In practice, a simulation is likely to be far more alien and alienating that just locking people up in a house. We have certain levels of control in a simulation that we wouldn’t have in reality, but even that could hurt—I’m not sure how I would react if I knew my mind and emotions and state of tiredness were open to manipulation.
So what I’m mainly trying to say is that using simulations (or predictions about simulations) to do safety work is a difficult and subtle project, and needs to be thoroughly planned out with, at minimum, a lot of psychologists and some anthropologists. I think it can be done, but not glibly and not easily.
The others are mainly oral, with people coming up with plans that involve simulating humans for long periods of time, me doing the equivalent of saying “have you considered value drift” and (often) the reaction from the other revealing that no, they had not considered value drift.
Ah, value drift has been on my mind for so long that it’s surprising to me that people could be thinking about simulating humans for long periods of time without thinking about value drift. Thanks for the update!
The most important differences I foresee are the unforseen :-) I mean that seriously, because anything that is easy to foresee will possibly be patched before implementation.
I guess my perspective here is that pretty soon we’ll be forced to live in a real environment that will be quite alien / drift-inducing already, so maybe it wouldn’t be so hard to construct a virtual environment that would be better in comparison, so the risk-minimizing thing to do would be to put yourself in such an environment as soon as possible and then work on further risk reduction from there. (See this recent news as another sign pointing to that coming soon.)
Most of the simulation ideas do away with that.
Yeah I agree that getting the social aspect right is probably the hardest part, and we might need more than a small group of virtual humans to do that.
So what I’m mainly trying to say is that using simulations (or predictions about simulations) to do safety work is a difficult and subtle project, and needs to be thoroughly planned out with, at minimum, a lot of psychologists and some anthropologists. I think it can be done, but not glibly and not easily.
I don’t know which problems/systems you’re referring to. Maybe you could cite these in the post to give more motivation?
What are the most important differences that you foresee?
The main one is when I realised the problems with CEV: https://www.lesswrong.com/posts/vgFvnr7FefZ3s3tHp/mahatma-armstrong-ceved-to-death
The others are mainly oral, with people coming up with plans that involve simulating humans for long periods of time, me doing the equivalent of saying “have you considered value drift” and (often) the reaction from the other revealing that no, they had not considered value drift.
The most important differences I foresee are the unforseen :-) I mean that seriously, because anything that is easy to foresee will possibly be patched before implementation.
But if we look at how research happens nowadays, it has a variety of different approaches and institutional cultures, certain levels of feedback both from within the AI safety community and the surrounding world, grounding our morality and keeping us connected to the flow of culture (such as it is).
Most of the simulation ideas do away with that. If someone suggested that the best idea for AI safety would be to lock up AI safety researchers in an isolated internet-free house for ten years and see what they came up with, we’d be all over the flaws in this plan (and not just the opportunity costs). But replace that physical, grounded idea with a similar one that involves “simulation”, and suddenly people flip into far mode and are more willing to accept it. In practice, a simulation is likely to be far more alien and alienating that just locking people up in a house. We have certain levels of control in a simulation that we wouldn’t have in reality, but even that could hurt—I’m not sure how I would react if I knew my mind and emotions and state of tiredness were open to manipulation.
So what I’m mainly trying to say is that using simulations (or predictions about simulations) to do safety work is a difficult and subtle project, and needs to be thoroughly planned out with, at minimum, a lot of psychologists and some anthropologists. I think it can be done, but not glibly and not easily.
Ah, value drift has been on my mind for so long that it’s surprising to me that people could be thinking about simulating humans for long periods of time without thinking about value drift. Thanks for the update!
I guess my perspective here is that pretty soon we’ll be forced to live in a real environment that will be quite alien / drift-inducing already, so maybe it wouldn’t be so hard to construct a virtual environment that would be better in comparison, so the risk-minimizing thing to do would be to put yourself in such an environment as soon as possible and then work on further risk reduction from there. (See this recent news as another sign pointing to that coming soon.)
Yeah I agree that getting the social aspect right is probably the hardest part, and we might need more than a small group of virtual humans to do that.
I think this framing makes sense.