Seeking feedback on this AI Safety proposal: (I don’t have experience in AI experimentation)
I’m interested in the question of, “How can we use smart AIs to help humans at strategic reasoning.”
We don’t want the solution to be, “AIs just tell humans exactly what to do without explaining themselves.” We’d prefer situations where smart AIs can explain to humans how to think about strategy, and this information makes humans much better at doing strategy.
One proposal to make progress on this is to set a benchmark for having smart AIs help out dumb AIs by providing them with strategic information.
Or more specifically, we find methods of having GPT4 give human-understandable prompts to GPT2, that would allow GPT2 to do as well as possible on specific games like chess.
Some improvements/changes would include:
Try to expand the games to include simulations of high-level human problems. Like simplified versions of Civilization.
We could also replace GPT2 with a different LLM that better represents how a human, with some amount of specialized knowledge (for example, being strong at probability).
There could be a strong penalty for prompts that aren’t human-understandable.
Use actual humans in some experiments. See how much they improve at specific [chess | civilization] moves, with specific help text.
Instead of using GPT2, you could likely just use GPT4. My impression is that GPT4 is a fair bit worse than the SOTA chess engines. So you use some amplified GPT4 procedure, to figure out how to come up with the best human-understandable chess prompts, to give to GPT4s without the amplification.
You set certain information limits. For example, you see how good of a job an LLM could do with “100 bits” of strategic information.
A solution would likely involve search processes where GPT4 experiments with a large space of potential English prompts, and tests them over the space of potential chess moves. I assume that reinforcement learning could be done here, but perhaps some LLM-heavy mechanism could work better. I’d assume that good strategies would be things like, “In cluster of situations X, you need to focus on optimizing Y.” So the “smart agent” would need to be able to make clusters of different situations, and solve for a narrow prompt for many of them.
It’s possible that the best “strategies” would be things like long decision-trees. One of the key things to learn about is what sorts/representations of information wind up being the densest and most useful.
Zooming out, if we had AIs that we knew give AIs and humans strong and robust strategic advice in test cases, I imagine we could use some of this for real life cases—perhaps most importantly, to strategize about AI safety.
Seeking feedback on this AI Safety proposal:
(I don’t have experience in AI experimentation)
I’m interested in the question of, “How can we use smart AIs to help humans at strategic reasoning.”
We don’t want the solution to be, “AIs just tell humans exactly what to do without explaining themselves.” We’d prefer situations where smart AIs can explain to humans how to think about strategy, and this information makes humans much better at doing strategy.
One proposal to make progress on this is to set a benchmark for having smart AIs help out dumb AIs by providing them with strategic information.
Or more specifically, we find methods of having GPT4 give human-understandable prompts to GPT2, that would allow GPT2 to do as well as possible on specific games like chess.
Some improvements/changes would include:
Try to expand the games to include simulations of high-level human problems. Like simplified versions of Civilization.
We could also replace GPT2 with a different LLM that better represents how a human, with some amount of specialized knowledge (for example, being strong at probability).
There could be a strong penalty for prompts that aren’t human-understandable.
Use actual humans in some experiments. See how much they improve at specific [chess | civilization] moves, with specific help text.
Instead of using GPT2, you could likely just use GPT4. My impression is that GPT4 is a fair bit worse than the SOTA chess engines. So you use some amplified GPT4 procedure, to figure out how to come up with the best human-understandable chess prompts, to give to GPT4s without the amplification.
You set certain information limits. For example, you see how good of a job an LLM could do with “100 bits” of strategic information.
A solution would likely involve search processes where GPT4 experiments with a large space of potential English prompts, and tests them over the space of potential chess moves. I assume that reinforcement learning could be done here, but perhaps some LLM-heavy mechanism could work better. I’d assume that good strategies would be things like, “In cluster of situations X, you need to focus on optimizing Y.” So the “smart agent” would need to be able to make clusters of different situations, and solve for a narrow prompt for many of them.
It’s possible that the best “strategies” would be things like long decision-trees. One of the key things to learn about is what sorts/representations of information wind up being the densest and most useful.
Zooming out, if we had AIs that we knew give AIs and humans strong and robust strategic advice in test cases, I imagine we could use some of this for real life cases—perhaps most importantly, to strategize about AI safety.