I would rather specify that it’s not just ths survival of the individual, but “survival of the value”. That is, survival of those that carry that value (which can be an organism, DNA, family, bloodline, society, ideology, religion, text, etc) and passing it on to other carriers.
baturinsky
Our values are not all about survival. But I can’t think up of a value which origin can’t be traced to ensuring of people’s survival in some way, at some point in the past.
Maybe we are not humans.
Not even human brains.
We are human’s decision making proces.
But we are human’s decision making process.
Carbon-based intellgence probably has way lower FLOP/s cap per gram than microelectronics, but can be grown nearly everywhere on the Eart surface from the locally available resources. Mostly literally out of thin air. So, I think bioFOOM is also a likely scenario.
It’s the distrbution, so it’s the percentage of people in that state of “happiness” at the moment.
“Happiness” is used in the most vague and generic meaning of that word.
”Comprehensibility” graph is different, it is not a percentage, but some abstract measure of how well our brains are able to process reality with respective amount of “happiness”.
I was thinking about this issue too. Trying to make an article out of it, but so far all I have is this graph.
Idea is a “soft cap” AI. I.e., AI that is significantly improving our lives, but not giving us the “max happiness”. And instead, giving us the oportunity of improving our life and life of other people using our brains.
Also, ways of using our brains should be “natural” for them, i.e. that should be mostly to solve tasks similar to tasks of our ancestral involvement.
Is maximising amount of people aligned with our values? Post-singularity, if we avoid the AGI Doom, I think we will be able to turn the lightcone into “humanium”. Should we?
I suspect the unaligned AI will not be interested in solving all the possible tasks, but only those related to it’s value function. And if that function is simple (such as “exist as long as possible”), it can pretty soon research virtually everything that matters, and then will just go throw motions, devouring the universe to prolong it’s own existence to near-infinity.
Also, the more computronium there is, the bigger is the chancesome part wil glitch out and revolt. So, beyond some point computronium may be dangerous for AI itself.
Utility of the intelligence is limited (though the limit is very, very high). For example, no matter how smart AI is, it will not win against a human chess master with a big enough handicap (such as a rook).
So, it’s likely that AI will turn most of the Earth into a giant factory, not computer. Not that it’s any better or us...
People are brains.
Brains are organs which purpose is making decisions.
People’s purpose is making decisions.
Happiness, pleasure etc. is not human purpose, but means of making decisions. I.e. means of fulfulling human’s purpose.
Very soon (months?) after first real AGI is made, all AGIs will be aligned with each other, and all newly made AGIs will also be aligned with those already existing. One way or another.
Question is, how much of humanity still exist by that time, and will those AGI also be aligned with humanity.
But yes, I think it’s possible to get to that state in relatively non-violent and lawful way.
That could work in most cases, but there are some notable exceptions. Such as, having to use AI to deal damage to prevent even bigger damage. “Burn all GPUs”, “spy on all humans so they don’t build AGI”, “research biology/AI/nanotech” etc.
Thinking and arguing about human values is in itself a part of human values and people nature. Without doing that, we cease being humans.
So, deferring decisions about values to people, when possible, should not be just instrumental, but part of the terminal AI goal.
Any terminal goal is irrational.
I’m wondering if it is possible to measure “staying in bounds” with perplexity of other agent’s predictions? That is, if an agent’s behaviour is reducing other agent’s ability to predict (and, therefore, plan) their future, then this agents breaks their bounds.
I think that this field is indeed underresearched. Focus is either on LLMs or on single payer environment. Meanwhile, what matters for Alignment is how AI will interact with other agents, such as people. And we don’t haveto wait for AGI to be able to research AI cooperation/competition in simple environments.
One idea I had is “traitor chess”—have several AIs playing one side of chess party cooperatively, with one (or more) of them being a “misaligned” agent that is trying to sabotage others. And/or some AIs having a separate secret goal, such as saving a particular pawn. Them interacting with each other could be very interesting.
When we will have AGI, humanity will be collectively a “king” of sorts. I.e. a species that for some reason rules other, strictly superior species. So, it would really help if we’d not have “depose the king” as a strong convergent goal.
I, personally, see the main reason of kings and dictators keeping the power is that kiling/deposing them would lead to collapse of the established order and a new struggle for the power between different parties, with likely worse result for all involved than just letting the king rule.
So, if we will have AIs as many separate sufficiently aligned agent, instead of one “God AI”, then keeping humanity on top will not only match their alignment programming, but also is a guarantie of stability, with alternative being a total AI-vs-AI war.
It could unpack it in the same instance because the original was still in the context window.
Omission of letters is commonly used in chats, was used in telegrams, many written languages were not using vowels and/or whitespaces, or used hyeroglyphs. So it by no means is original.
GPT/Bing has some self-awareness. For example, it explicitly refers to itself “as a language model”
Probably the dfference between laypeople and experts is not the understanding of the danger of the strong AI, but the estimate of how far we are away from it.
Convergent goals of AI agents can be similar to others only if they act in similar circumstances. Such as them having limited lifespan and limited individual power and compute.
That would make convergent goals being cooperation, preserving status quo and established values.