I’ve been thinking about what you’ve said about iterated amplification, and there are some things I’m unsure of. I’m still rather skeptical of the benefit of iterated amplification, so I’d really appreciate a response.
You mentioned that iterated amplification can be useful when you have only very limited, domain-specific models of human behavior, where such models would be unable to come up with the ability to create code. However, there are two things I’m wondering about. The first is that it seems to me that, for a wide range of situations, you need a general and robustly accurate model of human behavior to perform well. The second is that, even if you don’t have a general model of human behavior, it seems to me that it’s sufficient to only have one amplification step, which I suppose isn’t iterated amplification. And the big benefit to avoiding iterated amplification is that iterated amplification results in exponential decreases in reliability from compounding errors on each distillation step, but with a single amplification step, this exponential decrease in reliability wouldn’t occur.
For the first topic, suppose your AI is trained to make movies. I think just about every human value is relevant to the creation of movies, because humans usually like movies with a happy ending, and to make an ending happy you need to understand what humans consider a “happy ending”.
Further, you would need an accurate model of human cognitive capabilities. To make a good movie, it needs to be easy enough for humans to understand. But sometimes it also shouldn’t be too easy, because that can remove the mystery of it.
And the above is not just true for movies: I think creating other forms of entertainment would involve the same things as above.
Could you do the above with only some domain-limited model of what counts as confusing or a good or bad ending in the context of movies? It’s not clear to me that this is possible. Movies involve a very wide variety of situations, and you need to keep things understandable and resulting in a happy ending in all of those circumstances. I don’t see how could you robustly do the above without a general model of what people people find confusing or otherwise bad.
Further, whenever an AI needs to explain something to humans, it seems to me that it’s important that it has an accurate model of what humans can understand and not understand. Is there any way to do this with purely domain-specific models rather than with a general understanding of what people find confusing? It’s not clear to me that this is possible. For example, imagine an AI that needs to explain many different things. Maybe it’s tasked with creating learning materials or making the news. With such a broad category of things the AI needs to explain, it’s really not clear to me how an AI could do this without a general model of what makes things confusing or not.
Also more generally, it seems to me that whenever the AI is involved with human interaction in novel circumstances, it will need an accurate model of what people like and dislike. For example, consider an AI tasked with coming up with a plan for human workers. Doing so has the potential to involve an extremely wide range of values. For example, humans generally value novelty, autonomy, not feeling embarrassed, not being bored, not being overly pressured, not feeling offended, and not seeing disgusting or ugly things.
Could you have an AI learn to avoid things things with only domain-specific models, rather than a general understanding of what people value and disvalue? I’m not sure how to do this. Maybe you could learn models that work for reflecting people’s values in limited circumstances. However, I think an essential component of intelligence is to come up with novel plans involving novel situations. And I don’t see how an agent could do this without a general understanding of values. For example, the AI might create entire new industries, and it would be important that any human workers in those industries would have satisfactory conditions.
Now, for the second topic: using amplification without iteration.
First off, I want to note that, even without a general model of humans, it’s still not really clear to me that you need any amplification at all. As I’ve said before, even mere human imitation the potential to result in extremely high intelligence simply by doing the same things humans do, but much faster. As I mentioned previously, consider the human output to be published research papers from top researchers, and the AI is tasked with mimicking it. Then the AI could take the research papers as the human output and use this to create future papers but far far faster.
But suppose you do still need amplification. Then I don’t see why one amplification step wouldn’t be enough. I think that if you put together a sufficiently large number of intelligent humans and give them unlimited time to think, they’d be able to solve pretty much anything that iterated amplification with HCH would be able to solve. So, instead of having multiple amplification and distillation steps, you could instead just have one very large amplification step that would involve a large enough number of humans models interacting that it could solve pretty much anything.
If the amplification step involve a sufficiently large number of people, you might be concerned that it would be intractable to emulate them all.
I’m not sure if this would be a problem. Consider again the AI designed to mimic the research papers of top researchers. I think that often a small number of top researchers are responsible for a large proportion of research progress, so the AI could potentially just see that output of the top, say, 100 or 1000 researchers working together would be. And the AI would potentially be able to produce the outputs of each researcher with far less computation. That sounds plausibly like enough to me.
But suppose that’s not enough, and emulating every human individually during the amplification step is intractable. Then here’s how I think you can get around this: train not only a human model, but also a system of approximating the output of an expensive computation with much lower computational cost. Then, for the amplification step, you can define an computing involving an extremely large number of interacting emulated humans, and then allow the approximation system to come up with approximations to this without needing to directly emulate every human.
To give a sense of how this might work, note that in a computation, often a small amount of the parts of the computation account for a large part of the output. For example, if you are trying to approximate a computation about gravity, commonly only the closest, most massive objects have significant gravitational effect on something, and you can ignore the rest. Similarly, rather than simulate individual atoms, it’s much more efficient to come up with groups of large number of atoms, and consider their effect as a group. The same is true for other computations involving many small components.
To emulate humans, you could potentially do the same things as you would when simulating gravity. Specifically, an AI may be able to consider groups of humans and infer what the final output of that group will be, without actually needing to emulate each one individually. Further, for very challenging topics, many people may fail to contribute anything to the final result, so the could potentially avoid emulating them at all.
So I still can’t really see the benefit of iterated amplification. Of course, I could be missing something, so I’m interesting in hearing what you think.
One potential problem is that it might be hard to come up with good training data for an arbitrary-function-approximator, since finding the exact output of expensive functions would be expensive. However, it’s not clear to me how big of a problem this would be. As I’ve said before, even the output of a 100 or 1000 humans interacting could potentially be all the AI ever needs, and with sufficient fast approximations of individual humans, this could be tractable to create training data for.
Further, I bet the AI could learn a lot about arbitrary-function approximation just by training on approximating functions that are already reasonably fast the compute. I think the basic techniques to quickly approximating functions are what I mentioned before: come up with abstract objects that involve groups of individual components, and know when to stop performing the computation on a certain object because it’s clear it will have little effect on the final result.
Amplification induces a dynamic in the model space, it’s a concept of improving models (or equivalently in this context, distributions). This can be useful when you don’t have good datasets, in various ways.
For robustness, you have a dataset that’s drawn from the wrong distribution, and you need to act in a way that you would’ve acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won’t matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn’t automatically make sense, comparing models by usefulness doesn’t fall out of the other concepts.
For chess, you’d use the idea of winning games (better models are those that win more, thus amplification should move models towards winning), which is not inherent in any dataset of moves. For AGI, this is much more nebulous, but things like reflection (thinking about a problem longer, conferring with others, etc.) seem like a possible way of bootstrapping a relevant amplification, if goodharting is kept in check throughout the process.
For robustness, you have a dataset that’s drawn from the wrong distribution, and you need to act in a way that you would’ve acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won’t matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn’t automatically make sense, comparing models by usefulness doesn’t fall out of the other concepts.
Interesting. Do you have any links discussing this? I read Paul Christiano’s post on reliability amplification, but couldn’t find mention of this. And, alas, I’m having trouble finding other relevant articles online.
Amplification induces a dynamic in the model space, it’s a concept of improving models (or equivalently in this context, distributions). This can be useful when you don’t have good datasets, in various ways. Also it ignores independence when talking about recomputing things
Yes, that’s true. I’m not claiming that iterated amplification doesn’t have advantages. What I’m wondering is if non-iterated amplification is a viable alternative. I haven’t seen non-iterated amplification proposed before for creating algorithm AI. Amplification without iteration has the disadvantage that it may not have the attractor dynamic iterated amplification has, but it also doesn’t have the exponentially increasing unreliability iterated amplification has. So, to me at least, it’s not clear to me if pursuing iterated amplification is a more promising strategy than amplification without iteration.
I’ve been thinking about what you’ve said about iterated amplification, and there are some things I’m unsure of. I’m still rather skeptical of the benefit of iterated amplification, so I’d really appreciate a response.
You mentioned that iterated amplification can be useful when you have only very limited, domain-specific models of human behavior, where such models would be unable to come up with the ability to create code. However, there are two things I’m wondering about. The first is that it seems to me that, for a wide range of situations, you need a general and robustly accurate model of human behavior to perform well. The second is that, even if you don’t have a general model of human behavior, it seems to me that it’s sufficient to only have one amplification step, which I suppose isn’t iterated amplification. And the big benefit to avoiding iterated amplification is that iterated amplification results in exponential decreases in reliability from compounding errors on each distillation step, but with a single amplification step, this exponential decrease in reliability wouldn’t occur.
For the first topic, suppose your AI is trained to make movies. I think just about every human value is relevant to the creation of movies, because humans usually like movies with a happy ending, and to make an ending happy you need to understand what humans consider a “happy ending”.
Further, you would need an accurate model of human cognitive capabilities. To make a good movie, it needs to be easy enough for humans to understand. But sometimes it also shouldn’t be too easy, because that can remove the mystery of it.
And the above is not just true for movies: I think creating other forms of entertainment would involve the same things as above.
Could you do the above with only some domain-limited model of what counts as confusing or a good or bad ending in the context of movies? It’s not clear to me that this is possible. Movies involve a very wide variety of situations, and you need to keep things understandable and resulting in a happy ending in all of those circumstances. I don’t see how could you robustly do the above without a general model of what people people find confusing or otherwise bad.
Further, whenever an AI needs to explain something to humans, it seems to me that it’s important that it has an accurate model of what humans can understand and not understand. Is there any way to do this with purely domain-specific models rather than with a general understanding of what people find confusing? It’s not clear to me that this is possible. For example, imagine an AI that needs to explain many different things. Maybe it’s tasked with creating learning materials or making the news. With such a broad category of things the AI needs to explain, it’s really not clear to me how an AI could do this without a general model of what makes things confusing or not.
Also more generally, it seems to me that whenever the AI is involved with human interaction in novel circumstances, it will need an accurate model of what people like and dislike. For example, consider an AI tasked with coming up with a plan for human workers. Doing so has the potential to involve an extremely wide range of values. For example, humans generally value novelty, autonomy, not feeling embarrassed, not being bored, not being overly pressured, not feeling offended, and not seeing disgusting or ugly things.
Could you have an AI learn to avoid things things with only domain-specific models, rather than a general understanding of what people value and disvalue? I’m not sure how to do this. Maybe you could learn models that work for reflecting people’s values in limited circumstances. However, I think an essential component of intelligence is to come up with novel plans involving novel situations. And I don’t see how an agent could do this without a general understanding of values. For example, the AI might create entire new industries, and it would be important that any human workers in those industries would have satisfactory conditions.
Now, for the second topic: using amplification without iteration.
First off, I want to note that, even without a general model of humans, it’s still not really clear to me that you need any amplification at all. As I’ve said before, even mere human imitation the potential to result in extremely high intelligence simply by doing the same things humans do, but much faster. As I mentioned previously, consider the human output to be published research papers from top researchers, and the AI is tasked with mimicking it. Then the AI could take the research papers as the human output and use this to create future papers but far far faster.
But suppose you do still need amplification. Then I don’t see why one amplification step wouldn’t be enough. I think that if you put together a sufficiently large number of intelligent humans and give them unlimited time to think, they’d be able to solve pretty much anything that iterated amplification with HCH would be able to solve. So, instead of having multiple amplification and distillation steps, you could instead just have one very large amplification step that would involve a large enough number of humans models interacting that it could solve pretty much anything.
If the amplification step involve a sufficiently large number of people, you might be concerned that it would be intractable to emulate them all.
I’m not sure if this would be a problem. Consider again the AI designed to mimic the research papers of top researchers. I think that often a small number of top researchers are responsible for a large proportion of research progress, so the AI could potentially just see that output of the top, say, 100 or 1000 researchers working together would be. And the AI would potentially be able to produce the outputs of each researcher with far less computation. That sounds plausibly like enough to me.
But suppose that’s not enough, and emulating every human individually during the amplification step is intractable. Then here’s how I think you can get around this: train not only a human model, but also a system of approximating the output of an expensive computation with much lower computational cost. Then, for the amplification step, you can define an computing involving an extremely large number of interacting emulated humans, and then allow the approximation system to come up with approximations to this without needing to directly emulate every human.
To give a sense of how this might work, note that in a computation, often a small amount of the parts of the computation account for a large part of the output. For example, if you are trying to approximate a computation about gravity, commonly only the closest, most massive objects have significant gravitational effect on something, and you can ignore the rest. Similarly, rather than simulate individual atoms, it’s much more efficient to come up with groups of large number of atoms, and consider their effect as a group. The same is true for other computations involving many small components.
To emulate humans, you could potentially do the same things as you would when simulating gravity. Specifically, an AI may be able to consider groups of humans and infer what the final output of that group will be, without actually needing to emulate each one individually. Further, for very challenging topics, many people may fail to contribute anything to the final result, so the could potentially avoid emulating them at all.
So I still can’t really see the benefit of iterated amplification. Of course, I could be missing something, so I’m interesting in hearing what you think.
One potential problem is that it might be hard to come up with good training data for an arbitrary-function-approximator, since finding the exact output of expensive functions would be expensive. However, it’s not clear to me how big of a problem this would be. As I’ve said before, even the output of a 100 or 1000 humans interacting could potentially be all the AI ever needs, and with sufficient fast approximations of individual humans, this could be tractable to create training data for.
Further, I bet the AI could learn a lot about arbitrary-function approximation just by training on approximating functions that are already reasonably fast the compute. I think the basic techniques to quickly approximating functions are what I mentioned before: come up with abstract objects that involve groups of individual components, and know when to stop performing the computation on a certain object because it’s clear it will have little effect on the final result.
Amplification induces a dynamic in the model space, it’s a concept of improving models (or equivalently in this context, distributions). This can be useful when you don’t have good datasets, in various ways.
For robustness, you have a dataset that’s drawn from the wrong distribution, and you need to act in a way that you would’ve acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won’t matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn’t automatically make sense, comparing models by usefulness doesn’t fall out of the other concepts.
For chess, you’d use the idea of winning games (better models are those that win more, thus amplification should move models towards winning), which is not inherent in any dataset of moves. For AGI, this is much more nebulous, but things like reflection (thinking about a problem longer, conferring with others, etc.) seem like a possible way of bootstrapping a relevant amplification, if goodharting is kept in check throughout the process.
Interesting. Do you have any links discussing this? I read Paul Christiano’s post on reliability amplification, but couldn’t find mention of this. And, alas, I’m having trouble finding other relevant articles online.
Yes, that’s true. I’m not claiming that iterated amplification doesn’t have advantages. What I’m wondering is if non-iterated amplification is a viable alternative. I haven’t seen non-iterated amplification proposed before for creating algorithm AI. Amplification without iteration has the disadvantage that it may not have the attractor dynamic iterated amplification has, but it also doesn’t have the exponentially increasing unreliability iterated amplification has. So, to me at least, it’s not clear to me if pursuing iterated amplification is a more promising strategy than amplification without iteration.