tailcalled comments on What’s Up With Confusingly Pervasive Goal Directedness?

tailcalled 21 Jan 2022 16:13 UTC
3 points
I agree that non-agentic AI is a fools errand when it comes to alignment, but there’s one point where I sort of want to defend it as not being quite as bad as this post suggests:

Me: Okay, so partly you’re pointing out that hardness of the problem isn’t just about getting the AI to do what I want, it’s that doing what I want is actually just really hard. Or rather, the part where alignment is hard is precisely when the thing I’m trying to accomplish is hard. Because then I need a powerful plan, and it’s hard to specify a search for powerful plans that don’t kill everyone.

This depends heavily on the geometry/measure of the space you search over, which depends heavily on the “interface” you have for interacting with the world.

Consider the case of getting rid of cancer. If your interface is a chemical mixture that you inject into the cancer patient(s), it would technically be a valid solution for that chemical mixture to contain nanobots that seize broad power and uses this for extensive research that eventually generates a cure for cancer. But this is a complex solution, which requires enormously many coordinated pieces. It probably occupies a much smaller part of the search space than genuine solutions, and even unaligned solutions would tend to be stuff like “this kills the cancer but it also kills the patient”. Further, just from a computational point of view, the power-grabbing solution would be much easier to “catch” because it suddenly requires modelling huge parts of society, which might be orders of magnitude more expensive than anything staying within a person.

On the other hand, if your interface to the world isn’t a chemical mixture that gets injected into patients, but is instead a computer program that gets uploaded to some server, then a great deal of power seeking is necessary to even get an arrangement where you are able to medically affect the cancer patients. The jump from a direct medical solution to even further power seeking becomes much smaller.

This isn’t just an informal argument; you can take a look at e.g. Alex Turner’s proofs, and see that they are deeply dependent on the measure on the goals (which is formalized by the symmetry group chosen).

That said, this doesn’t necessarily solve things. There are some tasks that can be very satisfactorily solved with a narrow window like this, so that power-seeking isn’t a problem. But there is enormous economic value in more generalized interaction with the world, so there will inevitably be pressure to building genuine agents.