Why should we expect that as the AI gradually automates us away, it replace us with better versions of ourselves rather than non-sentient, or minimally non-aligned, robots who just do its bidding?
NickGabs
I don’t think we have time before AGI comes to deeply change global culture.
This is true probably for some extremely high level of superintelligence, but I expect much stupider systems to kill us if any do; I think human level ish AGI is already a serious x risk, and humans aren’t even close to being intelligent enough to do this.
Why do you expect that the most straightforward plan for an AGI to accumulate resources is so illegible to humans? If the plan is designed to be hidden to humans, then it involves modeling them and trying to deceive them. But if not, then it seems extremely unlikely to look like this, as opposed to the much simpler plan of building a server farm. To put it another way, if you planned using a world model as if humans didn’t exist, you wouldn’t make plans involving causing a civil war in Brazil. Unless you expect the AI to be modeling the world at an atomic level, which seems computationally intractable particularly for a machine with the computational resources of the first AGI.
This seems unlikely to be the case to me. However, even if this is the case and so the AI doesn’t need to deceive us, isn’t disempowering humans via force still necessary? Like, if the AI sets up a server farm somewhere and starts to deploy nanotech factories, we could, if not yet disempowered, literally nuke it. Perhaps this exact strategy would fail for various reasons, but more broadly, if the AI is optimizing for gaining resources/accomplishing its goals as if humans did not exist, then it seems unlikely to be able to defend against human attacks. For example, if we think about the ants analogy, ants are incapable of harming us not just because they are stupid, but because they are also extremely physically weak. If human are faced with physically powerful animals, even if we can subdue them easily, we still have to think about them to do it.
Check out CLR’s research: https://longtermrisk.org/research-agenda. They are focused on answering questions like these because they believe that competition between AI’s is a big source of s-risk
It seems to me that it is quite possible that language models develop into really good world modelers before they become consequentialist agents or contain consequentialist subagents. While I would be very concerned with using an agentic AI to control another agentic AI for the reasons you listed and so am pessimistic about eg debate, AI still seems like it could be very useful for solving alignment.
This seems pretty plausible to me, but I suspect that the first AGIs will exhibit a different distribution of skills across cognitive domains than humans and may also be much less agentic. Humans evolved in environments where the ability to form and execute long term plans to accumulate power and achieve dominance over other humans was highly selected for. The environments in which the first AGIs are trained may not have this property. That doesn’t mean they won’t develop it, but they may very well not until they are more strongly and generally superintelligent,
They rely on secrecy to gain relative advantages, but absolutely speaking, openness increases research speed; it increases the amount of technical information available to every actor.
With regard to God specifically, belief in God is somewhat unique because God is supposed to make certain things good in virtue of his existence; the value of the things religious people value is predicated on the existence of God. In contrast, the value of cake to the kid is not predicated on the actual existence of the cake.
I think this is a good point and one reason to favor more CEV style solutions to alignment, if they are possible, rather than solutions which align make the values of the AI relatively “closer” to our original values.
Or, the other way around, perhaps “values” are defined by being robust to ontology shifts.
This seems wrong to me. I don’t think that reductive physicalism is true (i. e. the hard problem really is hard), but if I did, I would probably change my values significantly. Similarly for religious values; religious people seem to think that God has a unique metaphysical status such that his will determines what is right and wrong, and if no being with such a metaphysical status existed, their values would have to change.
How do you know what that is? You don’t have the ability to stand outside the mind-world relationship and perceive it, any more than anything else. You have beliefs about the mind-world relationship, but they are all generated by inference in your mind. If there were some hard core of non-inferential knowledge about he ontological nature of reality, you might be able to lever it to gain more knowledge, but there isn’t because because the same objections apply
I’m not making any claims about knowing what it is. The OP’s argument is that our normal deterministic model is self refuting because it undermines our ability to have knowledge, so the truth of the model can be assumed in the first place.
The point is about correspondence. Neither correlations nor predictive accuracy amount to correspondence to a definite ontology.
Yes, a large range of worlds with different ontologies imply the same observations. The further question of assigning probabilities to those different worlds comes down to how to assign initial priors, which is a serious epistemological problem. However, this seems unrelated to the point made in the OP, which is that determinism is self-undermining.
More broadly, I am confused as to what claim you think that I am making which you disagree with.
By “process,” I don’t mean internal process of thought involving an inference from perceptions to beliefs about the world, I mean the actual perceptual and cognitive algorithm as a physical structure in the world. Because of the way the brain actually works in a deterministic universe, it ends up correlated with the external world. Perhaps this is unknowable to us “from the inside,” but the OP’s argument is not about external world skepticism given direct access only to what we perceive, but rather that given normal hypotheses about how the brain works, we should not trust the beliefs it generates. I am simply pointing out that this is false, because these normal hypotheses imply the kind of correlation that we want.
We get correspondence to reality through predictive accuracy; we can predict experience well using science because scientific theories are roughly isomorphic to the structures in reality that they are trying to describe.
Yeah this is exactly right imo. Thinking about good epistemics as about believing what is “justified” or what you have “reasons to believe” is unimportant/useless insofar as it departs from “generated by a process makes that the ensuing map correlate with the territory.” In the world where we don’t have free will, but our beliefs are produced deterministically by our observations and our internal architecture in a way such that they are correlated with the world, we have all the knowledge that we need.
While this doesn’t answer the question exactly, I think important parts of the answer include the fact that AGI could upload itself to other computers, as well as acquire resources (minimally money) completely through using the internet (e. g. through investing in stocks via the internet). A superintelligent system with access to trillions of dollars and with huge numbers of copies of itself on computers throughout the world more obviously has a lot of potentially very destructive actions available to it than one stuck on one computer with no resources.
I think this is probably true; I would assign something like a 20% chance of some kind of government action in response to AI aimed at reducing x-risk, and maybe a 5-10% chance that it is effective enough to meaningfully reduce risk. That being said, 5-10% is a lot, particularly if you are extremely doomy. As such, I think it is still a major part of the strategic landspace even if it is unlikely.