mishka answers Does the hardness of AI alignment undermine FOOM?

mishka 1 Jan 2024 3:13 UTC
11 points
0
Not if the goal is to be maximally efficient and competent at improving capabilities (which is a very natural goal for the AI ecosystem to have). Then “foom, as long as you can do so without harming the future capability advances” becomes an instrumental subgoal.

Then, instead of a full-blown alignment problem we just end up having a constraint: “don’t destroy the environment and the fabric of reality in a fashion which is so radical as to undermine further capabilities and capability growth”. This is a minimal “AI existential safety constraint” which the AIs will have to solve and to “keep solved”. Because AIs will be very motivated to solve this one and to ’keep it solved”, they would have a reasonable chance at doing so (and at successfully delegating some parts of the solution to their smarter successors, which are expected to be at least as interested in this problem as their “parents”, and perhaps even more, because they are smarter).

This is actually something valuable; it is a part of what we would consider a satisfactory solution of AI existential safety. We definitely want that. We don’t want everything to be utterly destroyed, we do want to be able to see rapid progress.

But we want more than that, so the question is what would it take for the AIs to want those other properties of the “world trajectory” that we want the “world trajectory” to have… I don’t think “alignment to an arbitrary set of properties” is feasible, I think that being able to force AIs to want and preserve arbitrary properties is unlikely. Instead we need to create a situation where the AI ecosystem naturally wants to preserve such properties of the “world trajectory” that what we actually want is a corollary of those properties...

So, perhaps, instead of starting from human values, we might start with a question: what other properties besides “don’t destroy the environment and the fabric of reality in a fashion which is so radical as to undermine further capabilities and capability growth” might become natural invariants which an evolving, fooming AI ecosystem would value and would really try to preserve, and what would it take to have a trajectory where those properties actually become the goals the AI ecosystem would strongly care about...