By “human interests”, do you mean something the programmers put in, leaving aside the problem of formalizing said interests from the diverse and contradictory goals (with a probably empty intersection, if you take a large enough slice of humanity)?
“Human interests” is meant in a vague sense—there is some sense in which an agent that cures cancer (without doing anything else that humans would consider perverse or weird) is “more beneficial” than an agent that turns everything into paperclips, regardless of how you formalize things or deal with contradictions. This paper discusses technical problems that arise no matter how you formalize “human interests.”
To answer the question that you clarify in later comments, I do not yet have even a vague satisfactory description. Formalizing “human interests” is far from a solved problem, but it’s also not a very “technical” problem at present, which is why it’s not discussed much in this agenda (though the end of section 4 points in that direction).
The same problem applies to any set of interests, though. It’s not just that default AI drives will conflict with (say) liberal humanist interests. They’d conflict with “evangelize Christianity and ensure the survival of the traditional family” too.
The same problem applies to any set of interests, though
I assume that you are talking about the problem of AI value drift, or, as OP puts it
a smarter-than-human system [...] reliably pursues beneficial goals “aligned with human interests”
What I am asking is whether OP presumes that the problem of figuring out what “human interests” are to begin with has been solved, at least in some informal way, like “ensure surviving, thriving and diverse humanity far into the future”, or “comply with the literal world of the scriptures”, or “live in harmony with nature”. Even before we worry about the AI munchkining its way into fulfilling the goal in a way a jackass genie would.
Actually, if I understand it correctly, the value problem is turning informal values into formal ones, not figuring out the informal values to begin with.
Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.
If we want to drive to the store, we have to both have a working car, and know how to get to the store. If the car is broken, we can fix the car. If we don’t know how to get to the store, we can look at a map. We have to do both.
If someone else wants to use the car to drive to church, we may disagree about destinations but we both want a working car. Fixing the car doesn’t “presume” that the destination question has been solved; rather, it’s necessary to get to any destination.
(OTOH, if we fix the car and the church person steals it, that would kinda suck.)
Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.
Right, I didn’t mean “OP is clueless by assuming that the problem has been solved”, but “let’s assume the problem has been solved, and work on the next step”. Probably worded it poorly, given the misunderstanding.
By “human interests”, do you mean something the programmers put in, leaving aside the problem of formalizing said interests from the diverse and contradictory goals (with a probably empty intersection, if you take a large enough slice of humanity)?
“Human interests” is meant in a vague sense—there is some sense in which an agent that cures cancer (without doing anything else that humans would consider perverse or weird) is “more beneficial” than an agent that turns everything into paperclips, regardless of how you formalize things or deal with contradictions. This paper discusses technical problems that arise no matter how you formalize “human interests.”
To answer the question that you clarify in later comments, I do not yet have even a vague satisfactory description. Formalizing “human interests” is far from a solved problem, but it’s also not a very “technical” problem at present, which is why it’s not discussed much in this agenda (though the end of section 4 points in that direction).
The same problem applies to any set of interests, though. It’s not just that default AI drives will conflict with (say) liberal humanist interests. They’d conflict with “evangelize Christianity and ensure the survival of the traditional family” too.
I assume that you are talking about the problem of AI value drift, or, as OP puts it
What I am asking is whether OP presumes that the problem of figuring out what “human interests” are to begin with has been solved, at least in some informal way, like “ensure surviving, thriving and diverse humanity far into the future”, or “comply with the literal world of the scriptures”, or “live in harmony with nature”. Even before we worry about the AI munchkining its way into fulfilling the goal in a way a jackass genie would.
Section 4 of the document discusses value learning as an open problem involving its own challenges.
Actually, if I understand it correctly, the value problem is turning informal values into formal ones, not figuring out the informal values to begin with.
Thanks!
Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.
If we want to drive to the store, we have to both have a working car, and know how to get to the store. If the car is broken, we can fix the car. If we don’t know how to get to the store, we can look at a map. We have to do both.
If someone else wants to use the car to drive to church, we may disagree about destinations but we both want a working car. Fixing the car doesn’t “presume” that the destination question has been solved; rather, it’s necessary to get to any destination.
(OTOH, if we fix the car and the church person steals it, that would kinda suck.)
Right, I didn’t mean “OP is clueless by assuming that the problem has been solved”, but “let’s assume the problem has been solved, and work on the next step”. Probably worded it poorly, given the misunderstanding.