Is there a body of knowledge about controlling self-modifying programs which could be used as a stepping stone to explaining what would be involved in FAI?
People like me wrote self-modifying machine code programs back in the 1980s—but self-modification quickly went out of fashion. For one thing, you couldn’t run from read-only storage. For another, it made your code difficult to maintain and debug.
Self-modifying code never really came back into fashion. We do have programs writing other programs, though: refactoring, compilers, code-generating wizards and genetic programming.
Until people figure out how to create reliable self-modifying programs that have modest goals, I’m not going to worry about self-improving AI of any sort being likely any time soon.
Perhaps the rational question is: How far are we from useful self-modifying programs?
Self-modifying programs seems like a bit of a red herring. Most likely groups of synthetic agents will become capable of improving the design of machine minds before individual machines can do that. So, you would then have a self-improving ecosystem of synthetic intelligent agents.
This probably helps with the wirehead problem, and with any Godel-like problems associated with a machine trying to understand its entire mind.
Today, companies that work on editing/refactoring/lint etc tools are already using their own software to build the next generation of programming tools. There are still humans in the loop—but the march of automation is working on that gradually.
NancyL: Is there a body of knowledge about controlling self-modifying programs which could be used as a stepping stone to explaining what would be involved in FAI?
TimT: Self-modifying programs seems like a bit of a red herring. Most likely groups of synthetic agents will become capable of improving the design of machine minds before individual machines can do that. So, you would then have a self-improving ecosystem of synthetic intelligent agents. This probably helps with the wirehead problem …
I agree that a multi-agent systems perspective is the most fruitful way of looking at the problem. And I agree that coalitions are far less susceptible to the pathologies that can arise with mono-maniacal goal systems. A coalition of agents is rational in a different, softer way than is a single unified agent. For example, it might split its charitable contributions among charities. Does that weaker kind of rationality mean that coalitions should be denigrated? I think not.
There is a huge and growing body of knowledge about controlling multi-agent systems. Unfortunately, so far as I know, little of it deals with the scenario in which the agents are busily constructing more agents.
That does happen quite a bit in genetic and memetic algorithms—and artificial life systems.
A coalition of agents is rational in a different, softer way than is a single unified agent. For example, it might split its charitable contributions among charities.
A note, though… if I had a billion dollars and decided just to give it to whoever GiveWell recommended as their top-rated international charities, due to most charities’ difficulty in converting significant extra funds into the same level of effect, I would end up giving 1+10+50+1+0.3+5=67.3 million to 6 different charities and then become confused at what to do with my 932.7 million dollars.
I know the Gates Foundation does look like a coalition of agents rather than a single agent, but it doesn’t look like a coalition of 7549+ agents. I’d guess at most about a dozen and probably fewer Large Components.
Is maintaining sufficient individuality likely to be a problem for the synthetic agents?
Only if they are built to want individuality. We will probably start of with collective systems—because if you have one agent, it is easy to make another one the same, whereas it is not easy to make an agent with a brain twice as big (unless you are trivially adding memory or something). So: collective systems are easier to get off the ground with—they are the ones we are likely to build first.
You can see this in most data centres—they typically contain thousands of small machines, loosely linked together.
Maybe they will ultimately find ways to plug their brains into each other and more comprehensively merge together—but that seems a bit further down the line.
I was concerned that synthetic agents might become so similar to each other that the advantages of different points of view would get lost. You brought up the possibility that they might start out very similar to each other.
If they started out similar, such agents could still come to differ culturally. So, one might be a hardware expert, another might be a programmer, and another might be a tester, as a result of exposure to different environments.
However, today we build computers of various sizes, optimised for various different applications—so probably more like that.
There’s a limit to how similar people can be made to each other, but if there are efforts to optimize all the testers (for example), it could be a problem.
Well, I doubt machines being too similar to each other will cause too many problems. The main case where that does cause problems is with resistance to pathogens—and let’s hope we do a good job of designing most of those out of existence. Apart from that, being similar is usually a major plus point. It facilitates mass production, streamlined and simplified support, etc.
Yes. As tim points out below, the main thing that programmers are taught is “self-modifying programs are almost always more trouble than they’re worth—don’t do it.”
My hunch is that self-modifying AI is far more likely to crash than it is to go FOOM, and that non-self-modifying AI (or AI that self-modifies in very limited ways) may do fairly well by comparison.
My understanding was that the CEV approach is a meta-level approach to stable self improvement, aiming to design code that outputs what we would want an FAI’s code to look like (or something like this). I could certainly be wrong of course, and I have very little to go on here, as the Knowability of FAI and CEV are both more vague than I would like (since, of course, the problems are still way open) and several years old, so I have to piece the picture together indirectly.
If that interpretation is correct it seems (and I stress that I might be totally off base with this) that stable recursive self-improvement over time is not the biggest conceptual concern, but rather the biggest conceptual difficulty is determining how to derive a coherent goal set from a bunch of Bayesian utility maximizers equipped with each individual person’s utility function (and how to extract each person’s utility function), or something like that. A stable self-improving code would then (hopefully) be extrapolated by the resulting CEV, which is actually the initial dynamic.
My comment wasn’t directed towards CEV at all—CEV sounds like a sensible working definition of “friendly enough”, and I agree that it’s probably computationally hard.
I was suggesting that any program, AI or no, that is coded to rewrite critical parts of itself in substantial ways is likely to go “splat”, not “FOOM”—to degenerate into something that doesn’t work at all.
Is there a body of knowledge about controlling self-modifying programs which could be used as a stepping stone to explaining what would be involved in FAI?
People like me wrote self-modifying machine code programs back in the 1980s—but self-modification quickly went out of fashion. For one thing, you couldn’t run from read-only storage. For another, it made your code difficult to maintain and debug.
Self-modifying code never really came back into fashion. We do have programs writing other programs, though: refactoring, compilers, code-generating wizards and genetic programming.
Until people figure out how to create reliable self-modifying programs that have modest goals, I’m not going to worry about self-improving AI of any sort being likely any time soon.
Perhaps the rational question is: How far are we from useful self-modifying programs?
Self-modifying programs seems like a bit of a red herring. Most likely groups of synthetic agents will become capable of improving the design of machine minds before individual machines can do that. So, you would then have a self-improving ecosystem of synthetic intelligent agents.
This probably helps with the wirehead problem, and with any Godel-like problems associated with a machine trying to understand its entire mind.
Today, companies that work on editing/refactoring/lint etc tools are already using their own software to build the next generation of programming tools. There are still humans in the loop—but the march of automation is working on that gradually.
I agree that a multi-agent systems perspective is the most fruitful way of looking at the problem. And I agree that coalitions are far less susceptible to the pathologies that can arise with mono-maniacal goal systems. A coalition of agents is rational in a different, softer way than is a single unified agent. For example, it might split its charitable contributions among charities. Does that weaker kind of rationality mean that coalitions should be denigrated? I think not.
To answer Nancy’s question, there is a huge and growing body of knowledge about controlling multi-agent systems. Unfortunately, so far as I know, little of it deals with the scenario in which the agents are busily constructing more agents.
That does happen quite a bit in genetic and memetic algorithms—and artificial life systems.
I checked with the Gates Foundation. 7549 grants and counting!
It seems as though relatively united agents can split their charitable contributions too.
A note, though… if I had a billion dollars and decided just to give it to whoever GiveWell recommended as their top-rated international charities, due to most charities’ difficulty in converting significant extra funds into the same level of effect, I would end up giving 1+10+50+1+0.3+5=67.3 million to 6 different charities and then become confused at what to do with my 932.7 million dollars.
I know the Gates Foundation does look like a coalition of agents rather than a single agent, but it doesn’t look like a coalition of 7549+ agents. I’d guess at most about a dozen and probably fewer Large Components.
Their fact sheet says 24 billion dollars.
Is maintaining sufficient individuality likely to be a problem for the synthetic agents?
Only if they are built to want individuality. We will probably start of with collective systems—because if you have one agent, it is easy to make another one the same, whereas it is not easy to make an agent with a brain twice as big (unless you are trivially adding memory or something). So: collective systems are easier to get off the ground with—they are the ones we are likely to build first.
You can see this in most data centres—they typically contain thousands of small machines, loosely linked together.
Maybe they will ultimately find ways to plug their brains into each other and more comprehensively merge together—but that seems a bit further down the line.
I was concerned that synthetic agents might become so similar to each other that the advantages of different points of view would get lost. You brought up the possibility that they might start out very similar to each other.
If they started out similar, such agents could still come to differ culturally. So, one might be a hardware expert, another might be a programmer, and another might be a tester, as a result of exposure to different environments.
However, today we build computers of various sizes, optimised for various different applications—so probably more like that.
There’s a limit to how similar people can be made to each other, but if there are efforts to optimize all the testers (for example), it could be a problem.
Well, I doubt machines being too similar to each other will cause too many problems. The main case where that does cause problems is with resistance to pathogens—and let’s hope we do a good job of designing most of those out of existence. Apart from that, being similar is usually a major plus point. It facilitates mass production, streamlined and simplified support, etc.
Yes. As tim points out below, the main thing that programmers are taught is “self-modifying programs are almost always more trouble than they’re worth—don’t do it.”
My hunch is that self-modifying AI is far more likely to crash than it is to go FOOM, and that non-self-modifying AI (or AI that self-modifies in very limited ways) may do fairly well by comparison.
My understanding was that the CEV approach is a meta-level approach to stable self improvement, aiming to design code that outputs what we would want an FAI’s code to look like (or something like this). I could certainly be wrong of course, and I have very little to go on here, as the Knowability of FAI and CEV are both more vague than I would like (since, of course, the problems are still way open) and several years old, so I have to piece the picture together indirectly.
If that interpretation is correct it seems (and I stress that I might be totally off base with this) that stable recursive self-improvement over time is not the biggest conceptual concern, but rather the biggest conceptual difficulty is determining how to derive a coherent goal set from a bunch of Bayesian utility maximizers equipped with each individual person’s utility function (and how to extract each person’s utility function), or something like that. A stable self-improving code would then (hopefully) be extrapolated by the resulting CEV, which is actually the initial dynamic.
My comment wasn’t directed towards CEV at all—CEV sounds like a sensible working definition of “friendly enough”, and I agree that it’s probably computationally hard.
I was suggesting that any program, AI or no, that is coded to rewrite critical parts of itself in substantial ways is likely to go “splat”, not “FOOM”—to degenerate into something that doesn’t work at all.
This sounds like decision theory stuff that Eliezer and others are trying to figure out.