The same problem applies to any set of interests, though. It’s not just that default AI drives will conflict with (say) liberal humanist interests. They’d conflict with “evangelize Christianity and ensure the survival of the traditional family” too.
The same problem applies to any set of interests, though
I assume that you are talking about the problem of AI value drift, or, as OP puts it
a smarter-than-human system [...] reliably pursues beneficial goals “aligned with human interests”
What I am asking is whether OP presumes that the problem of figuring out what “human interests” are to begin with has been solved, at least in some informal way, like “ensure surviving, thriving and diverse humanity far into the future”, or “comply with the literal world of the scriptures”, or “live in harmony with nature”. Even before we worry about the AI munchkining its way into fulfilling the goal in a way a jackass genie would.
Actually, if I understand it correctly, the value problem is turning informal values into formal ones, not figuring out the informal values to begin with.
Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.
If we want to drive to the store, we have to both have a working car, and know how to get to the store. If the car is broken, we can fix the car. If we don’t know how to get to the store, we can look at a map. We have to do both.
If someone else wants to use the car to drive to church, we may disagree about destinations but we both want a working car. Fixing the car doesn’t “presume” that the destination question has been solved; rather, it’s necessary to get to any destination.
(OTOH, if we fix the car and the church person steals it, that would kinda suck.)
Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.
Right, I didn’t mean “OP is clueless by assuming that the problem has been solved”, but “let’s assume the problem has been solved, and work on the next step”. Probably worded it poorly, given the misunderstanding.
The same problem applies to any set of interests, though. It’s not just that default AI drives will conflict with (say) liberal humanist interests. They’d conflict with “evangelize Christianity and ensure the survival of the traditional family” too.
I assume that you are talking about the problem of AI value drift, or, as OP puts it
What I am asking is whether OP presumes that the problem of figuring out what “human interests” are to begin with has been solved, at least in some informal way, like “ensure surviving, thriving and diverse humanity far into the future”, or “comply with the literal world of the scriptures”, or “live in harmony with nature”. Even before we worry about the AI munchkining its way into fulfilling the goal in a way a jackass genie would.
Section 4 of the document discusses value learning as an open problem involving its own challenges.
Actually, if I understand it correctly, the value problem is turning informal values into formal ones, not figuring out the informal values to begin with.
Thanks!
Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.
If we want to drive to the store, we have to both have a working car, and know how to get to the store. If the car is broken, we can fix the car. If we don’t know how to get to the store, we can look at a map. We have to do both.
If someone else wants to use the car to drive to church, we may disagree about destinations but we both want a working car. Fixing the car doesn’t “presume” that the destination question has been solved; rather, it’s necessary to get to any destination.
(OTOH, if we fix the car and the church person steals it, that would kinda suck.)
Right, I didn’t mean “OP is clueless by assuming that the problem has been solved”, but “let’s assume the problem has been solved, and work on the next step”. Probably worded it poorly, given the misunderstanding.