Though I don’t want to make claims about how common such programs would be.
If you don’t want to make claims about how common such programs are, how do you defend the (implicit) assertion that such programs are worth talking about, especially in the context of the alignment problem?
I don’t want to make claims about how many random programs make explicit predictions about the future to reach their goals. For all I know it could be 1% and it could be 99%. However, I do make claims about how common other kinds of programs are. I claim that a given random program, regardless of whether it explicitly predicts the future, is unlikely to have the kind of motivational structure that would exhibit instrumental convergence.
Incidentally, I’m also interested in what specifically you mean by “random program”. A natural interpretation is that you’re talking about a program that is drawn from some kind of distribution across the set of all possible programs, but as far as I can tell, you haven’t actually defined said distribution. Without a specific distribution to talk about, any claim about how likely a “random program” is to do anything is largely meaningless, since for any such claim, you can construct a distribution that makes that claim true.
(Note: The above paragraph was originally a parenthetical note on my other reply, but I decided to expand it into its own, separate comment, since in my experience having multiple unrelated discussions in a single comment chain often leads to unproductive conversation.)
Well, good question. Frankly I don’t think it matters. I don’t believe that my claims are sensitive to the distributions (aside from some convoluted ones), or that giving you a specific distribution would help you to defend either position (feel free to prove me wrong). But when I want to feel rigorous, I assume that I’m starting off with a natural length-based distribution over all Turing machines (or maybe all neural networks), then discard all machines that fail to pass some relatively simple criteria about the output they generate (e.g. does it classify a given set of cat pictures correctly), keep the ones that passed, normalize and draw from that.
But really, by “random” I mean nearly anything that’s not entirely intentional. To use a metaphor for machine learning, if you pick a random point in the world map, then find the nearest point that’s 2km above sea level, you’ll find a “random” point that’s 2km above sea level. The algorithm has a non-random step, but the outcome is clearly random in a significant way. The distribution you get is different from the one I described in my previous paragraph (where you just filtered the initial point distribution to get the points at 2km), but they’ll most likely be close.
I claim that a given random program, regardless of whether it explicitly predicts the future, is unlikely to have the kind of motivational structure that would exhibit instrumental convergence.
Yes, I understand that. What I’m more interested in knowing, however, is how this statement connects to AI alignment in your view, since any AI created in the real world will certainly not be “random”.
If you don’t want to make claims about how common such programs are, how do you defend the (implicit) assertion that such programs are worth talking about, especially in the context of the alignment problem?
I don’t want to make claims about how many random programs make explicit predictions about the future to reach their goals. For all I know it could be 1% and it could be 99%. However, I do make claims about how common other kinds of programs are. I claim that a given random program, regardless of whether it explicitly predicts the future, is unlikely to have the kind of motivational structure that would exhibit instrumental convergence.
Incidentally, I’m also interested in what specifically you mean by “random program”. A natural interpretation is that you’re talking about a program that is drawn from some kind of distribution across the set of all possible programs, but as far as I can tell, you haven’t actually defined said distribution. Without a specific distribution to talk about, any claim about how likely a “random program” is to do anything is largely meaningless, since for any such claim, you can construct a distribution that makes that claim true.
(Note: The above paragraph was originally a parenthetical note on my other reply, but I decided to expand it into its own, separate comment, since in my experience having multiple unrelated discussions in a single comment chain often leads to unproductive conversation.)
Well, good question. Frankly I don’t think it matters. I don’t believe that my claims are sensitive to the distributions (aside from some convoluted ones), or that giving you a specific distribution would help you to defend either position (feel free to prove me wrong). But when I want to feel rigorous, I assume that I’m starting off with a natural length-based distribution over all Turing machines (or maybe all neural networks), then discard all machines that fail to pass some relatively simple criteria about the output they generate (e.g. does it classify a given set of cat pictures correctly), keep the ones that passed, normalize and draw from that.
But really, by “random” I mean nearly anything that’s not entirely intentional. To use a metaphor for machine learning, if you pick a random point in the world map, then find the nearest point that’s 2km above sea level, you’ll find a “random” point that’s 2km above sea level. The algorithm has a non-random step, but the outcome is clearly random in a significant way. The distribution you get is different from the one I described in my previous paragraph (where you just filtered the initial point distribution to get the points at 2km), but they’ll most likely be close.
Maybe that answers your other comment too?
Yes, I understand that. What I’m more interested in knowing, however, is how this statement connects to AI alignment in your view, since any AI created in the real world will certainly not be “random”.