A new place to discuss cognitive science, ethics and human alignment
One could frame EA as the project of human alignment—of deconfusing ourselves about what we care about and figuring out how to actualize it. And there seems to be an interconnected bundle of problems at the core of this project:
What should we fundamentally value?
How to distinguish rational intuitions from biases?
How to make an AGI care about these questions?
Which ideas should we spread to help humanity process these questions more complexly?
How certain can we be about all of this?
These questions are particularly important, as we seem to live at the hinge of history and at a time of a growth in consciousness research[1]. They’re interconnected but require a range of disciplines perhaps too wide for a single person to fully grasp—which suggests there could be a great added value in stronger cooperation.
So if you like impossible problems and want to work on them together, you’re warmly welcome at the Mind & Values Research Group. The group is about the intersections of EA, cognitive science, the nature of consciousness and intelligence, moral philosophy and the formulation & propagation of ethical and rational principles.
How can these areas help nudge humanity in a positive directions?
1. The philosophy of mind angle
Deconfusing humanity about what we mean by values and intelligence could:
Help solve the technical side of AI alignment[2].
Advance the broad-longtermist mission introduced in What We Owe the Future—getting a clearer picture on which values should we lock in in these important times. Important topics here could be the nature of valence, net-positiveness of experiences, animal and digital sentience.
Help global prioritization by investigating the assumptions behind interventions for instance recommended by Happier Lives
2. The social change angle
Cognitive enhancement research—how to support rationality & moral circle expansions in society by formulating the most elegant case for rational ethics
Studying people’s biases about values could help here[3]. This area can also be advanced with some inferences from experimental philosophy or even history & sociology of ideas.
What could it lok like?
Meetups: The group will vote on topics to discuss. For the following month or so, people will be welcome to collect materials from different angles. We’ll discuss them in a virtual meetup and collect what was mentioned in a document people can get back to.
Newsletter: If the group gets bigger and hard to follow, I’ll create a newsletter to announce voting, meetups and to send out the notes. Meanwhile, I recommend turning on notifications for new posts.
Networking: People looking for ideas or people to work with within an area are welcome to post even just a short introduction.
- ^
Which points against neglectedness but to brain research opening new possibilities and to existing (cognitive) resources to utilize. https://www.mdpi.com/2076-3425/10/1/41 https://www.proquest.com/docview/2703039855/fulltextPDF/3152D39660CF4000PQ/1?accountid=16531
- ^
See Sotala or Superintelligence, pp. 406: Should whole brain emulation research be promoted? which indicates figuring out how human coherent extrapolation volition looks like could be of particular importance.
- ^
An example here could be the research discussed in the 80k Hours podcast with Sharon H. Rawlette and her Feeling of Value.
How is this even a question? If you are talking about a root utility function, that is not really something we decide, and almost by definition isn’t something an agent would change even if they could. Our terminal values simply are what they are—there absolutely is no ‘shouldness’ to them: any such ‘shouldness’ is just virtue signaling. A person generally won’t actually tell you their true terminal values even if they could (which is much of the point of mechanism design), and probably can’t regardless.
Why would we ever want to lock in values?
Thanks for the response, lot of fun prompts!
Most importantly, I believe there is shouldness to values, particularly, it sounds like a good defining feature of moral values—even though it might be an illusion we get to decide them freely (but that seems besides the point).
I don’t think it’s clear we don’t get to edit our terminal values. I might be egoistic at core and yet I could decide to undergo an operation that would make me a pure utilitarian. It might be signalling or a computational mistake on my part but I could. It also could be that the brain’s algorithm can update the model of what it optimizes. For instance, it could be the behavioral algorithms we choose are evaluated based on a model of “good life” which has a different representation of morality depending on what we’re influenced by, which is what “choice” means if free will is an illusion.
In the case of AGI—because there’s a strong case that the values an AGI develops as a default are misaligned with what we—and potential future people (would) care about. And because some values likely will get locked in via AGI, it’s just a question of which.
For the same reason the Founding Fathers wrote the constitution. The level to which something is locked in is a spectrum. Will MacAskill essentially suggests “locking in” the value of epistemic & moral humility, a principle that is supposed to update with more evidence.
Therefore, the question to what extent we want to lock in our present values is a big part of answering the question of which values should we lock in.
So by ‘root utility function’, I meant something like the result of using a superintelligent world model or oracle to predict possible futures, and then allowing the human to explore those futures and ultimately preference rank them.
So we don’t get to edit our root utility function—which is not to say we could not in theory with some hypothetical future operation as you mention—but we don’t in practice, and most would not want to.
Morality/ethics is more like an attempt to negotiate some set of cooperative instrumental values and is only loosely related to our root utility function in the sense that it ultimately steers everything.
That is not argument for locking in values, it is an argument against. But thankfully it is not all a given that values will get locked in. Human values seem to evolve slowly over time. A successfully aligned AGI will either model that evolution correctly (as in brain-like AGI and/or successful value learning), or be largely immune to it (through safe bounding via external empowerment for example) or utility uncertainty. There are numerous potential paths to the goal that don’t involve any value lock in (which could be disastrous).
To the limited extent that makes sense to me, it does so as a non-technical vague analogy to utility uncertainty.
There is only one thing we want to lock in: optimization aligned with our true unknown dynamic terminal utility function.
Sounds a lot like LessWrong, but often competition is healthy ;)