Having said this, I’m open to trying it for one of your arguments. So perhaps you can point me to one that you particularly want engagement on?
Perhaps you could read all three of these posts (they’re pretty short :) and then either write a quick response to each one and then I’ll decide which one to dive into, or pick one yourself (that you find particularly interesting, or you have something to say about).
Also, let me know if you prefer to do this here, via email, or text/audio/video chat. (Also, apologies ahead of time for any issues/delays as my kid is home all the time now, and looking after my investments is a much bigger distraction / time-sink than usual, after I updated away from “just put everything into an index fund”.)
My thoughts on each of these. The common thread is that it seems to me you’re using abstractions at way too high a level to be confident that they will actually apply, or that they even make sense in those contexts.
AGIs and economies of scale
Do we expect AGIs to be so competitive that reducing coordination costs is a big deal? I expect that the dominant factor will be AGI intelligence, which will vary enough that changes in coordination costs aren’t a big deal. Variations in human intelligence have a huge effect, and presumably variations in AGI intelligence will be much bigger.
There’s an obvious objection to giving one AGI all of your resources, which is “how do you know it’s aligned”? And this seems like an issue where there’d be unified dissent from people worried about both short-term and long-term safety.
Oh, another concern: if they’re all intent aligned to the same person, then this amounts to declaring that person dictator. Which is often quite a difficult thing to convince people to do.
Consider also that we’ll be in an age of unprecedented plenty, once we have aligned AGIs that can do things for us. So I don’t see why economic competition will be very strong. Perhaps military competition will be strong, but will countries really be converting so much of their economy to military spending that they need this edge to keep up?
So this seems possible, but very far from a coherent picture in my mind.
Some thoughts on metaphilosophy
These are a bunch of fun analogies here. But it is very unclear to me what you mean by “philosophy” here, since most, or perhaps all, of your descriptions would be equally applicable to “thinking” or “reasoning”. The model you give of philosophy is also a model of choosing the next move in the game of chess, and countless other things.
Similarly, what is metaphilosophy, and what would it mean to solve it? Reach a dead end? Be able to answer any question? Why should we think that the concept of a “solution” to metaphilosophy makes any sense?
Overall, this posts feels like it’s pointing at something interesting but I don’t know if it actually communicated any content to me. Like, is the point of the sections headed “Philosophy as interminable debate” and “Philosophy as Jürgen Schmidhuber’s General TM” just to say that we can never be certain of any proposition? As written, the post is consistent both with you having some deep understanding of metaphilosophy that I just am not comprehending, and also with you using this word in a nonsensical way.
Two Neglected Problems in Human-AI Safety
“There seems to be no reason not to expect that human value functions have similar problems, which even “aligned” AIs could trigger unless they are somehow designed not to.” There are plenty of reasons to think that we don’t have similar problems—for instance, we’re much smarter than the ML systems on which we’ve seen adversarial examples. Also, there are lots of us, and we keep each other in check.
“For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can’t keep up, and their value systems no longer apply or give essentially random answers.” What does this actually look like? Suppose I’m made the absolute ruler of a whole virtual universe—that’s a lot of power. How might my value system “not keep up”?
The second half of this post makes a lot of sense to me, in large part because you can replace “corrupt human values” with “manipulate people”, and then it’s very analogous to problems we face today. Even so, a *lot* of additional work would need to be done to make a plausible case that this is an existential risk.
“An objective that is easy to test/measure (just check if the target has accepted the values you’re trying to instill, or has started doing things that are more beneficial to you)”. Since when was it easy to “just check” someone’s values? Like, are you thinking of an AI reading them off our neurons?
Here’s a slightly stretched analogy to try and explain my overall perspective. If you talked to someone born a thousand years ago about the future, they might make claims like “the most important thing is making process on metatheology” or “corruption of our honour is an existential risk”, or “once instantaneous communication exists then economies of scale will be so great that countries will be forced to nationalise all their resources”. How do we distinguish our own position from theirs? The only way is to describe our own concepts at a level of clarity and detail that they just couldn’t have managed. So what I want is a description of what “metaphilosophy” is such that it would have been impossible to give an equally clear description of “metatheology” without realising that this concept is not useful or coherent. Maybe that’s too high a target, but I think it’s one we should keep in mind as what is *actually necessary* to reason at such an abstract level without getting into confusion.
“There seems to be no reason not to expect that human value functions have similar problems, which even “aligned” AIs could trigger unless they are somehow designed not to.” There are plenty of reasons to think that we don’t have similar problems—for instance, we’re much smarter than the ML systems on which we’ve seen adversarial examples. Also, there are lots of us, and we keep each other in check.
“For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can’t keep up, and their value systems no longer apply or give essentially random answers.” What does this actually look like? Suppose I’m made the absolute ruler of a whole virtual universe—that’s a lot of power. How might my value system “not keep up”?
I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.
But I can’t do the wrong thing, by my standards of value, if my “value system no longer applies”. So that’s part of what I’m trying to tease out.
Another part is: I’m not sure if Wei thinks this is just a governance problem (i.e. we’re going to put people in charge who do the wrong thing, despite some people advocating caution) or a more fundamental problem that nobody would do the right thing.
If the former, then I’d characterise this more as “more power magnifies leadership problems”. But maybe it won’t, because there’s also a much larger space of morally acceptable things you can do. It just doesn’t seem that easy to me to accidentally do a moral catastrophe if you’ve got a huge amount of power, and less so an irreversible one. But maybe this is just because I don’t know of whatever possible examples Wei thinks about.
Perhaps you could read all three of these posts (they’re pretty short :) and then either write a quick response to each one and then I’ll decide which one to dive into, or pick one yourself (that you find particularly interesting, or you have something to say about).
https://www.lesswrong.com/posts/Sn5NiiD5WBi4dLzaB/agi-will-drastically-increase-economies-of-scale
https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy
https://www.lesswrong.com/posts/HTgakSs6JpnogD6c2/two-neglected-problems-in-human-ai-safety
Also, let me know if you prefer to do this here, via email, or text/audio/video chat. (Also, apologies ahead of time for any issues/delays as my kid is home all the time now, and looking after my investments is a much bigger distraction / time-sink than usual, after I updated away from “just put everything into an index fund”.)
My thoughts on each of these. The common thread is that it seems to me you’re using abstractions at way too high a level to be confident that they will actually apply, or that they even make sense in those contexts.
AGIs and economies of scale
Do we expect AGIs to be so competitive that reducing coordination costs is a big deal? I expect that the dominant factor will be AGI intelligence, which will vary enough that changes in coordination costs aren’t a big deal. Variations in human intelligence have a huge effect, and presumably variations in AGI intelligence will be much bigger.
There’s an obvious objection to giving one AGI all of your resources, which is “how do you know it’s aligned”? And this seems like an issue where there’d be unified dissent from people worried about both short-term and long-term safety.
Oh, another concern: if they’re all intent aligned to the same person, then this amounts to declaring that person dictator. Which is often quite a difficult thing to convince people to do.
Consider also that we’ll be in an age of unprecedented plenty, once we have aligned AGIs that can do things for us. So I don’t see why economic competition will be very strong. Perhaps military competition will be strong, but will countries really be converting so much of their economy to military spending that they need this edge to keep up?
So this seems possible, but very far from a coherent picture in my mind.
Some thoughts on metaphilosophy
These are a bunch of fun analogies here. But it is very unclear to me what you mean by “philosophy” here, since most, or perhaps all, of your descriptions would be equally applicable to “thinking” or “reasoning”. The model you give of philosophy is also a model of choosing the next move in the game of chess, and countless other things.
Similarly, what is metaphilosophy, and what would it mean to solve it? Reach a dead end? Be able to answer any question? Why should we think that the concept of a “solution” to metaphilosophy makes any sense?
Overall, this posts feels like it’s pointing at something interesting but I don’t know if it actually communicated any content to me. Like, is the point of the sections headed “Philosophy as interminable debate” and “Philosophy as Jürgen Schmidhuber’s General TM” just to say that we can never be certain of any proposition? As written, the post is consistent both with you having some deep understanding of metaphilosophy that I just am not comprehending, and also with you using this word in a nonsensical way.
Two Neglected Problems in Human-AI Safety
“There seems to be no reason not to expect that human value functions have similar problems, which even “aligned” AIs could trigger unless they are somehow designed not to.” There are plenty of reasons to think that we don’t have similar problems—for instance, we’re much smarter than the ML systems on which we’ve seen adversarial examples. Also, there are lots of us, and we keep each other in check.
“For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can’t keep up, and their value systems no longer apply or give essentially random answers.” What does this actually look like? Suppose I’m made the absolute ruler of a whole virtual universe—that’s a lot of power. How might my value system “not keep up”?
The second half of this post makes a lot of sense to me, in large part because you can replace “corrupt human values” with “manipulate people”, and then it’s very analogous to problems we face today. Even so, a *lot* of additional work would need to be done to make a plausible case that this is an existential risk.
“An objective that is easy to test/measure (just check if the target has accepted the values you’re trying to instill, or has started doing things that are more beneficial to you)”. Since when was it easy to “just check” someone’s values? Like, are you thinking of an AI reading them off our neurons?
Here’s a slightly stretched analogy to try and explain my overall perspective. If you talked to someone born a thousand years ago about the future, they might make claims like “the most important thing is making process on metatheology” or “corruption of our honour is an existential risk”, or “once instantaneous communication exists then economies of scale will be so great that countries will be forced to nationalise all their resources”. How do we distinguish our own position from theirs? The only way is to describe our own concepts at a level of clarity and detail that they just couldn’t have managed. So what I want is a description of what “metaphilosophy” is such that it would have been impossible to give an equally clear description of “metatheology” without realising that this concept is not useful or coherent. Maybe that’s too high a target, but I think it’s one we should keep in mind as what is *actually necessary* to reason at such an abstract level without getting into confusion.
I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.
But I can’t do the wrong thing, by my standards of value, if my “value system no longer applies”. So that’s part of what I’m trying to tease out.
Another part is: I’m not sure if Wei thinks this is just a governance problem (i.e. we’re going to put people in charge who do the wrong thing, despite some people advocating caution) or a more fundamental problem that nobody would do the right thing.
If the former, then I’d characterise this more as “more power magnifies leadership problems”. But maybe it won’t, because there’s also a much larger space of morally acceptable things you can do. It just doesn’t seem that easy to me to accidentally do a moral catastrophe if you’ve got a huge amount of power, and less so an irreversible one. But maybe this is just because I don’t know of whatever possible examples Wei thinks about.