[Question] Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC

5 points

Coherent Extrapolated Volition Human Values

Hey, fellow people, I’m fairly new to LessWrong and if this question is irrelevant i apologise for that, however I was wondering whether any serious attempts to create a system for mapping out the CEV of humanity have been started yet?

Since the CEV ismore of a democratic process than any other alignment system I can think of, it would make sense to try to create a database for the purpose of a future AGI to calibrate itself on. If we where to create the database now we could explore if it suited our needs through trial and error (testing whether it would predict moral decisions), which would mean that we would get a more functional alignment system than we could otherwise get. Also as a follow up, if we where to create a central authority for creating this database, there’s a possibility that the authority could become a central alignment checking facility meaning that we could avoid potential misalginment disasters.

There are therefore quite clear reasons for me why it would be a good idea to start this project which is why I’m wondering if there are any such plans.

Thank you for your time.

Jonas Hallgren25 Feb 2021 22:06 UTC

5 points

2 comments1 min readLW link

Coherent Extrapolated Volition Human Values

Gordon Seidoh Worley 26 Feb 2021 14:56 UTC
5 points
I think your question sort of misunderstands the CEV proposal. It’s something aligned AI might produce, not something we would personally work towards creating. Yes we might keep CEV in mind when figuring out how to build aligned AI, but it’s not something we can go straight towards else we would Goodhart ourselves into existential catastrophe at the worst or astronomical waste at the best.
plex 26 Feb 2021 15:36 UTC
2 points
I think a slightly more general version of this question, referring to human values rather than specifically CEV, is maybe a fairly important point.
If we want a system to fulfill our best wishes it needs to learn what they are based on its models of us, and if too few of us spend time trying to work out what we want in an ideal world then the dataset it’s working from with be impoverished, perhaps to the point of causing problems.
I think addressing this is less pressing than other parts of the alignment problem, because it’s plausible that we can punt it to after the intelligence explosion, but it would maybe be nice to have some project started to collect information about idealized human values.

No comments.