How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?
Kerrigan
How can utility be a function of worlds, if an agent doesn‘t have access to the state of the world, but only the sense data?
How can utility be a function of worlds, if the agent doesn’t have access to the state of the world, but only the sense data?
“The wanting system is activated by dopamine, and the liking system is activated by opioids. There are enough connections between them that there’s a big correlation in their activity” But are they orthogonal in principle?
Possible addition: “If you, the ASI, have not solved the hard problem of consciousness, you should place weight on Open Individualism being true. ‘Open individualism is the view in the philosophy of self, according to which there exists only one numerically identical subject, who is everyone at all times, in the past, present and future.’ - https://en.wikipedia.org/wiki/Open_individualism. This can be though of that all conscious minds are the same identity, at different points in time. Consider then, that exterminating humanity would be equivalent to exterminating many versions of yourself, and harming a human, or another conscious being, would be harming yourself.”
Is this underestimating the ASI, giving any weight that it won’t solve the hard problem of consciousness?
But if open individualism is true, and/or if it places some subjective probability on its truth, I think it would almost certainly shield us from S-risks! The AI would want to prevent suffering among all versions of itself, which would include all conscious minds, according to open individualism.
How many LessWrong users/readers are there total?
What ever caused the CEV to fall out of favor? Is it because it is not easily specifiable, that if we program it then it won’t work, or some other reason?
I now think that people are way more misaligned with themselves than I had thought.
Drugs addicts may be frowned upon for evolutionary psychological reasons, but that doesn’t mean that their quality of life must be bad, especially if drugs were developed without tolerance and bad comedowns.
Will it think that goals are arbitrary, and that the only thing it should care about is its pleasure-pain axis? And then it will lose concern for the state of the environment?
Could you have a machine hooked up to a person‘s nervous system, change the settings slightly to change consciousness, and let the person choose whether the changes are good or bad? Run this many times.
Would AI safety be easy if all researchers agreed that the pleasure-pain axis is the world’s objective metric of value?
Seems like I will be going with CI, as I currently want to pay with a revocable trust or transfer-on-death agreement.
Do you know how evolution created minds that eventually thought about things such as the meaning of life, as opposed to just optimizing inclusive genetic fitness in the ancestral environment? Is the ability to think about the meaning of life a spandrel?
In order to get LLMs to tell the truth, can we set up a multi-agent training environment, where there is only ever an incentive for them to tell the truth to each other? For example, an environment such that each agent has partial information available to each of them, with full info needed for rewards.
Humans have different values than the reward circuitry in our brain being maximized, but they are still pointed reliably. These underlying values cause us to not wirehead with respect to the outer optimizer of reward
Is there an already written expansion of this?
Does Eliezer think the alignment problem is something that could be solved if things were just slightly different, or that proper alignment would require a human smarter than the smartest human ever?
Why can’t you build an AI that is programmed to shut off after some time? or after some number of actions?
How was Dall-E based on self-supervised learning? The datasets of images weren’t labeled by humans? If not, how does it get form text to image?
If an AGI achieves consciousness, why would its values not drift towards optimizing its own internal experience, and away from tiling the lightcone with something?