A general problem with optimization and control (in AI and commerce and everything else) is that it limits breadth of activity in (at least) two ways. The obvious way is that it avoids things which oppose the stated goals. That’s arguably a good and intentional limit.
But it ALSO avoids things that are illegible or unexpected and hard to quantify. I suspect that’s a large majority of experiential value for a large majority of humans.
I don’t know if this can be made rigorous enough to generate a ‘Friendliness Impossibility Theorem’, but it’s probably the thing that keeps me up at night most often about the future of conscious experience in the universe.
The sketch of that argument is that for any given level of conscious compute power (agentic and/or moral patient), MOST of the calculations will be illegible to that entity, and to entities of similar power. The only way to have provable/fully-trustworthy agreement on values is to have full knowledge of all other entities.
It’s possible that there’s a corollary that things CAN be provably friendly only if a majority of total compute power in conscious entities is used for monitoring and understanding each other, rather than for understanding and optimizing the rest of the universe. If that’s the case, there’s a very unfortunate value loss (to me and many current humans) in illegibility and surprise experiences.
tl;dr I like (some amount of hard-to-identify) chaos and surprises, and I don’t know how to square that with the fact that I don’t trust people or future AIs.
Not losing yourself in the long term is a challenge even without others implanting their influence. Brains are not built to avoid value drift for millions of years, learning is change that has no safety or reflective stability guarantees. So getting some sort of intelligent substrate that understands you or a change of cognitive architecture that lets you understand yourself is a relatively urgent need.
But if everyone is protected from external influence in depth as a result (including superintelligent influence), there is no need to guarantee safety of behavior, let alone cognition. Visible problems can always be solved with backups, it’s value drift of global power that would be irreversible.
I’ve lost myself multiple times over, even in my insanely brief (well over half of human expectation, but still massively inadequate) memory. I give a high probability that my personal experiential sequence will end and be lost forever fairly soon. That’s not my main concern here.
My concern is in understanding what I {do|should} care about beyond my own direct experiences. It’s about extrapolating the things I value in my own life to future trillions (or more, eventually) of experiencing entities. Or to a small number of extremely-experiencing entities, maybe—I don’t know what elements are important to my decisions of how to influence the future. I seem to care about “interesting” experiences more than specifically pleasant or painful or difficult ones. And I don’t know how to reconcile that with the goals often stated or implied around here.
My point is that there is motivation to solve the problem of value drift and manipulation both for people and global power, on the defense side. The thing to preserve/anchor could be anything that shouldn’t be lost, not necessarily conventional referents of concern. The conservative choice is trying to preserve everything except ignorance. But even vague preference about experience of others is a direction of influence that could persevere, indifferent to its own form.
If the problem is solved, there is less motivation to keep behavior or cognition predictable to make externalities safe for unprotected others, as there are no unprotected others to worry about.
I absolutely understand that there’s motivation to “solve the problem of value drift and manipulation”. I suspect that the problem is literally unsolvable, and I should be more agnostic about distant values than I seem to be. I’m trying on the idea of just hoping that there are intelligent/experiencing/acting agents for a long long time, regardless of what form or preferences those agents have.
I’m pretty sure https://en.wikipedia.org/wiki/Seeing_Like_a_State has bigger lessons than just government or large corporations.
A general problem with optimization and control (in AI and commerce and everything else) is that it limits breadth of activity in (at least) two ways. The obvious way is that it avoids things which oppose the stated goals. That’s arguably a good and intentional limit.
But it ALSO avoids things that are illegible or unexpected and hard to quantify. I suspect that’s a large majority of experiential value for a large majority of humans.
I don’t know if this can be made rigorous enough to generate a ‘Friendliness Impossibility Theorem’, but it’s probably the thing that keeps me up at night most often about the future of conscious experience in the universe.
The sketch of that argument is that for any given level of conscious compute power (agentic and/or moral patient), MOST of the calculations will be illegible to that entity, and to entities of similar power. The only way to have provable/fully-trustworthy agreement on values is to have full knowledge of all other entities.
It’s possible that there’s a corollary that things CAN be provably friendly only if a majority of total compute power in conscious entities is used for monitoring and understanding each other, rather than for understanding and optimizing the rest of the universe. If that’s the case, there’s a very unfortunate value loss (to me and many current humans) in illegibility and surprise experiences.
tl;dr I like (some amount of hard-to-identify) chaos and surprises, and I don’t know how to square that with the fact that I don’t trust people or future AIs.
Not losing yourself in the long term is a challenge even without others implanting their influence. Brains are not built to avoid value drift for millions of years, learning is change that has no safety or reflective stability guarantees. So getting some sort of intelligent substrate that understands you or a change of cognitive architecture that lets you understand yourself is a relatively urgent need.
But if everyone is protected from external influence in depth as a result (including superintelligent influence), there is no need to guarantee safety of behavior, let alone cognition. Visible problems can always be solved with backups, it’s value drift of global power that would be irreversible.
I’ve lost myself multiple times over, even in my insanely brief (well over half of human expectation, but still massively inadequate) memory. I give a high probability that my personal experiential sequence will end and be lost forever fairly soon. That’s not my main concern here.
My concern is in understanding what I {do|should} care about beyond my own direct experiences. It’s about extrapolating the things I value in my own life to future trillions (or more, eventually) of experiencing entities. Or to a small number of extremely-experiencing entities, maybe—I don’t know what elements are important to my decisions of how to influence the future. I seem to care about “interesting” experiences more than specifically pleasant or painful or difficult ones. And I don’t know how to reconcile that with the goals often stated or implied around here.
My point is that there is motivation to solve the problem of value drift and manipulation both for people and global power, on the defense side. The thing to preserve/anchor could be anything that shouldn’t be lost, not necessarily conventional referents of concern. The conservative choice is trying to preserve everything except ignorance. But even vague preference about experience of others is a direction of influence that could persevere, indifferent to its own form.
If the problem is solved, there is less motivation to keep behavior or cognition predictable to make externalities safe for unprotected others, as there are no unprotected others to worry about.
I absolutely understand that there’s motivation to “solve the problem of value drift and manipulation”. I suspect that the problem is literally unsolvable, and I should be more agnostic about distant values than I seem to be. I’m trying on the idea of just hoping that there are intelligent/experiencing/acting agents for a long long time, regardless of what form or preferences those agents have.