Human modelling is very close to human manipulation in design space. A system with accurate models of humans is close to a system which successfully uses those models to manipulate humans.
Trying to communicate why this sounds like magical thinking to me… Taylor is a data scientist for the local police department. Taylor notices that detectives are wasting a lot of time working on crimes which never get solved. They want to train a logistic regression on the crime database in order to predict whether a given crime will ever get solved, so detectives can focus their efforts on crimes that are solvable. Would you advise Taylor against this project, on the grounds that the system will be “too close in design space” to one which attempts to commit the perfect crime?
Although they do not rely on human modelling, some of these approaches nevertheless make most sense in a context where human modelling is happening: for example, impact measures seem to make most sense for agents that will be operating directly in the real world, and such agents are likely to require human modelling.
Let’s put AI systems into two categories: those that operate in the real world and those that don’t. The odds of x-risk from the second kind of system seem low. I’m not sure what kind of safety work is helpful, aside from making sure it truly does not operate in the real world. But if a system does operate in the real world, it’s probably going to learn about humans and acquire knowledge about our preferences. Which means you have to solve the problems that implies.
My steelman of this section is: Find a way to create a narrow AI that puts the world on a good trajectory.
Trying to communicate why this sounds like magical thinking to me… Taylor is a data scientist for the local police department. Taylor notices that detectives are wasting a lot of time working on crimes which never get solved. They want to train a logistic regression on the crime database in order to predict whether a given crime will ever get solved, so detectives can focus their efforts on crimes that are solvable. Would you advise Taylor against this project, on the grounds that the system will be “too close in design space” to one which attempts to commit the perfect crime?
Let’s put AI systems into two categories: those that operate in the real world and those that don’t. The odds of x-risk from the second kind of system seem low. I’m not sure what kind of safety work is helpful, aside from making sure it truly does not operate in the real world. But if a system does operate in the real world, it’s probably going to learn about humans and acquire knowledge about our preferences. Which means you have to solve the problems that implies.
My steelman of this section is: Find a way to create a narrow AI that puts the world on a good trajectory.