Any actual “hard” superhumans who may be listening are excluded from the audience of this question. I am currently refraining from describing my own ideas to avoid biasing the results; I’m not sure whether I’ll reveal something later or not.
I will probably be posting two comments to use the agreement axis of as a sort of poll for the yes/no version of the question, but other more detailed answers and comments are welcome as well. (The reason for using two is so that the per-comment aggregation will make the full counts visible.)
Strong no on versions of “aligned” such as:
Aligned with the weighted average of humans
Aligned with the median of humans
Aligned with the sum total of the behavior of a given human across all seconds of their life
More likely to be able to say “yes” to something like:
Aspirationally aligned with (i.e. would willingly push a button to sacrifice the parts of me that are in contradiction with) the best of what humans have managed to explicitly write down as their highest values
Aspirationally aligned with the best five percent of human behavior
I would be extremely reluctant to outsource my moral compass or decisionmaking power to almost any extant human.
I try to behave such that other people would be [substantially less extremely reluctant than baseline] to outsource their moral compasses or decisionmaking power to me, though I suspect most people both would and should still be extremely reluctant in an absolute sense.
I am skeptical there is any coherent collective values of humanity to be aligned to. But I am aligned to what I see as the most reasonable universal ethical principles; however, those do not place humanity significantly above other life forms, and if I became benevolent superhuman world sovereign, many people would be angry that they’re not allowed to exploit animals anymore, and I would laugh at them.
Humanity itself is so diverse that the question of what it would mean to be aligned with humanity is a tough one. You can’t hope for unanimous support of any proposal; even for something that would help a bunch of people and cost nothing at all, you could probably find some people who would say “the current system is bad and needs to collapse, and this would delay the collapse and be net bad”.
Additionally, I think the majority of humanity has not studied or thought through some important basics, for example in economics, which lead them to support plenty of things (e.g. price-gouging laws) that I consider mind-bogglingly stupid. I could enumerate a long list of policies that I think are (a) probably correct and (b) opposed by >90% of the humans who have an opinion on it.
So my views are not “aligned”. My actions are another matter. Of course, in this context we’re somewhat interested in what happens if I get the power to put my prescriptions into practice.
If I had magical powers… I’ve been thinking that the “aligned” thing to do would be to help humans grow, via unambiguously good changes, until they’re smart and educated enough to engage with my views on their merits (and, I expect, end up agreeing with a decent number of them). Changes like: making brains simply work better (I’m sure there would be tradeoffs eventually, but I suspect there are a lot of mutations that just make brains slightly worse with no benefits, and that eliminating those would be worth a lot of IQ points), making lifespans and healthspans longer (doubling them would be a nice start), ameliorating or eliminating lots of problems that interfere with people growing (e.g. sleep issues, psychological issues)… I’m sure if I looked into it, I could find a lot more.
Once the majority of people are emotionally healthy geniuses, things should get a lot better, and then I could reevaluate the situation and negotiate with the new people. As long as it hasn’t resulted in some dystopia or rule-by-dictator or in the geniuses blowing up the world, I don’t think I’d be tempted into any forceful interventions.
I think that’s the most important part of the intersection between “aligned with humanity as carefully as you’re going to get” and “transforming society in an enormously positive way”. If that were an option, I think I’d take it. In that respect, I could call myself “aligned”. (Though if I were only part god and the tools available to me had severe downsides people wouldn’t agree to… that might be a problem.)
I think that’s what an ideally “aligned” god would do… along with possibly taking forceful actions to prevent the world from getting destroyed in the meantime, such as by other nascent gods—which is unfortunately hard to distinguish from unaligned “take over the world” behavior. It would be nice if the world were such that, once one god was created and people proved its abilities, other people would voluntarily stop trying to create their own gods in exchange for getting some value from the first god. It seems like prior agreements to do that would help.
Interestingly, have just discussed similar issue with a friend and came up with a solution. Obviously, aligned AI cares about people’s subjective opinion, but that doesn’t mean it’s not allowed to talk/persuade them. Imagine a list of TED-style videos tailored specifically for you on each pressing issue that requires you changing your mind.
On the one hand, it presumes that people trust the AI enough to be persuaded, but keep in mind that we’re dealing with a smart restless agent. The only thing it asks is that you keep talking to it.
The last resort would be to press people on “if you think that’s a bad idea, are you ready to bet that this implemented is going to make the world worse?” and create a virtual prediction market between supporters
P.S. This all implies that AI is non-violent communicator. There are many ways to pull people’s strings to persuade them, I presume that we know how to distinguish between manipulative and informative persuasion.
A hint on how to do that is that AI should care about people making INFORMED decisions about THEIR subjective future, not about getting their opinions “objectively” right.
Several people have suggested that a sufficiently smart AI, with the ability to talk to a human as much as it wanted, could persuade the human to “let it out of the box” and give it access to the things it needs to take over the world. This seems plausible to me, say at least 10% probability, which is high enough that it’s worth trying to avoid. And it seems to me that, if you know how to make an AI that’s smart enough to be very useful but will voluntarily restrain itself from persuading humans to hand over the keys to the kingdom, then you must have already solved some of the most difficult parts of alignment. Which means this isn’t a useful intermediate state that can help us reach alignment.
Separately, I’ll mention my opinion that the name of the term “non-violent communication” is either subtle trolling or rank hypocrisy. Because a big chunk of the idea seems to be that you should stick to raw observations and avoid making accusations that would tend to put someone on the defensive… and implying that someone else is committing violence (by communicating in a different style) is one of the most accusatory and putting-them-on-the-defensive things you can do. I’m curious, how many adherents of NVC are aware of this angle on it?
I don’t think NVC tries to put down an opponent, it’s mostly about how you present your ideas. I think it models an opponent as “he tries to win the debate without thinking about my goals. let me think of both mine and theirs goals, so i’m one step ahead”. Which is a bit prerogative and looking down, but not exactly accusatory
Okay, hold my gluten-free kefir, boys! Please let me say it in full first without arguments, and then I will try to find more relevant links for each claim. I promise it’s relevant.
Introduction – Enlightenment?
Lately, I have been into hardcore mindfulness practices (see book) aimed at reaching “Enlightenment” in the sense on Buddha. There are some people who reliably claim they’ve succeeded and talk about their experience and how to reach there (e.g. see this talk and google each of the fellows if it resonates)
My current mental model of “Enlightenment” is as follows:
Evolutionally, we’ve had developed simple lizard brains first, mostly consisting of “register ⇒ process ⇒ decide ⇒ react” without much thought. Similar to the knee reflex, but sometimes a bit more complicated. Our intellectual minds capable of information processing, memory, superior pattern-matching; they have happened later.
These two systems coexist, and first one possesses second. However, the hardware of our brains has general information processing capabilities, and doesn’t require any “good-bad” instant decision reactionary mechanism. Even though it was “invented” earlier, it’s ad-hoc in the system. My metaphor would be a GPU or an ASIC that short-circuits some of the execution to help CPU process info faster.
However, makes a big difference in your subjective experience whether that first system being used or not. Un-winding this circuitry from your default information processing, which hand-wavily is “conscious attention”, or the “central point”; is what mindfulness is about.
“Enlightenment” is a moment when you relax enough so that your brain starts being able (but not required) to run information flows around the the lizard brain and experiencing sensory stimuli “directly”.
Similar “insight” moment happens when you realize that “money” is just paper, and not the Ultimate Human Value Leaderboard. You still can play along the illusion of money, you still can earn money, you still can enjoy money, but you can never go back to blindly obey what capitalism asks from you.
It should be quite obvious why this is good, but let me re-state again.
Anxiety goes down and doesn’t control you anymore
Motivation issues go away, the gap between “I want this to happen” and “I find myself doing different thing” is removed
You don’t care about status and external judgement anymore
You become more caring person to others internal states, but it feels freeing instead of locking-down
You find yourself in a space between stimulus and reaction
You can research your subjective experience deeper, e.g. find out how does brain constructs things like “time arrow” (answer: it’s lazy-loading)
What does it all have to do with the question?
First answer is alignment becomes easier.
I believe that once we normalize this enlightenment thing, and once it becomes the normal part of human medical care system (or even child development as vaccines); the things we think we value and things we do value will synchronize much more. E.g. there is non-trivial number of examples of people losing their addictions after getting a week of hardcore training in mindfulness (see dhamma.org for signing up, it’s completely free and worldwide).
Personally, for me alignment feels like “remembering” I always cared about other people, but was oblivious of that. It’s like how it’s hard to tune your attention to hear the music when there’s loud noise around you.
It’s like when there’s a sound that bugs you a lot, but you don’t notice it until it stops. In my case, when I noticed the “sound” (like how my actions hurt other people AND that I don’t enjoy them being hurt) I stopped the behavior myself.
Second answer is even more tentative.
I’ll say it anyway, because it’s too big if true. However, again I can’t promise any arguments and verifiable prediction. Read this as an invite to pick my mind further and try to strongman the position.
Love is the default human mode of perception, and it’s informationally/computationally easy.
Most of the “enlightened” people report that if you look close enough, existence consists only of one building block, and that is Pure Universal Love, aka God.
It’s not hidden somewhere or limited, it’s literally everywhere. It’s the same thing as “No-Self” or “True Self”, and “God-realization”. It was there all along and it will exist forever. It is fractally every small piece of reality, and the Reality itself as a whole.
When you really ask yourself what is that you want, and you skip the default “reactionary” answers, you find out that there’s only one course of action that you won’t regret and that you will genuinely enjoy.
In simpler examples, if you pay close attention to what you’re feeling when you smoke, you might find out that the nicotine hit is not worth these mouth feelings, smoke it your lungs, instant slight headache, upcoming down-wave of tiredness. That requires attention and deep inspection, but that’s presumably what our real nature is.
Same way, if you closely inspect your interactions with other people, you might find out that “winning” them doesn’t feel good. And “helping” them sometimes doesn’t feel good either. The only thing that deeply, really, genuinely feels good is caring for them. You might still be incentivized to not do that; or you might find yourself in situation not possible to change. But when you look close enough, there is no uncertainty.
Obviously, on the one hand it only tells us that Homo Sapiens are the agents that have their base execution layer wired to help each other (see Qualia Computing on indirect realism). It makes total sense from evolutionary standpoint.
However, it also feels computationally easy to do that. It doesn’t feel like work to find “True Love”. It’s not always easy, but when you do this, it feels like a relief, like un-doing of work. Like dropping off the coat after coming home from rainy outside. Finally I get to be free and care about others.
Can this hint that there’s some dynamic that makes is easier to align? That in some specific sense, alignment and cooperation is universally easier than defection?
I am not saying this because I want it to be true. I don’t really believe computer can accidentally “wake up” to the “True Love”.
I am saying this because it might happen so that there’s some invariant at play that makes it easier to wish for low-entropy worlds, or to compute them, or something along these lines.
Finally, answering the original question. Yes, I consider myself fully aligned in the sense of my super-ego caring about each individuals’ subjective experience.
In my current state, I don’t always act on that, but wherever I catch myself in a tough choice, I try to apply the mechanism of “what’s that answer that is most obvious?”
P.S. Two caveats:
Looks like this is a societal change to integrate this unwind-reactionary-behavior-Enlightenment into normal medical practice is even bigger than AGI alignment program.
Even given we find a chemical that can trigger this change, people would most probably be very reluctant to normalize it (e.g. see MDMA-therapy only becoming socially acceptable around now). Most probably we would face the alignment problem faster than this, and after this it wouldn’t matter
I might have just gone crazy from meditation and have started believing things that are not true. Subjectively, I feel there’s something to it that is very much worth exploring. But it might be similar to an LSD effect when you feel that “you’ve finally got it” but in reality you’re just drawing triangles inscribed in circles