Speculation: LLM Self Play into General Agent? Suppose you got a copy of GPT4 post fine tuning + hardware to train it. How would the following play out? 1. Give it the rules and state of a competitive game, such as automatically generated tic-tac-toe variants. 2. Prompt it to use chain of thought to consider the best next move and select it. 3. Provide it with the valid set of output choices (like a json format determining action and position, similar to AutoGPT) 4. Run two of these against each other continuously, training on the results of the victor which can be objectively measured by the game’s rules. 5. Benchmark it against a tiny subset of those variants that you want to manually program a bot with known ELO / have a human evaluate it. 6. Increase the complexity of the game when it reaches some general ability (eg tic tac toe variants > chess variants > Civilization 5 The Videogame variants)
This would have an interesting side effect of making its output more legible in some ways than a normal NN agent, though I suppose there’s no guarantee the chain of thought would stay legible English unless additional mechanisms were put in place, but this is just a high level idea.
Potential political opportunity: LLMs are trained on online data and will continue to be. If I want to make sure they are against communism by default, I could: Auto generate a bunch of public github repositories, Fill them with text I generate using gpt4o mini which is $15 per 4 million letters which I have prompted to be explicitly pro free markets and against communism. Entwine them by posting links to each other and the rest of the internet: highlight, share, fork, and star them to increase likelihood they are included in the dataset.
Kinda silly to do this with an idea you actually care about, especially if political (which would just increase the heat:light ratio in politics along the grain for Russian troll factories etc.). But carefully trying to make NN traps with some benign and silly misinformation—e.g. “whales are fish” or something—could be a great test to see if weird troll-generated examples on the internet can affect the behavior
Speculation: LLM Self Play into General Agent?
Suppose you got a copy of GPT4 post fine tuning + hardware to train it. How would the following play out?
1. Give it the rules and state of a competitive game, such as automatically generated tic-tac-toe variants.
2. Prompt it to use chain of thought to consider the best next move and select it.
3. Provide it with the valid set of output choices (like a json format determining action and position, similar to AutoGPT)
4. Run two of these against each other continuously, training on the results of the victor which can be objectively measured by the game’s rules.
5. Benchmark it against a tiny subset of those variants that you want to manually program a bot with known ELO / have a human evaluate it.
6. Increase the complexity of the game when it reaches some general ability (eg tic tac toe variants > chess variants > Civilization 5 The Videogame variants)
Note this is similar to what Gato did. https://deepmind.google/discover/blog/a-generalist-agent/
This would have an interesting side effect of making its output more legible in some ways than a normal NN agent, though I suppose there’s no guarantee the chain of thought would stay legible English unless additional mechanisms were put in place, but this is just a high level idea.
Potential political opportunity: LLMs are trained on online data and will continue to be. If I want to make sure they are against communism by default, I could: Auto generate a bunch of public github repositories, Fill them with text I generate using gpt4o mini which is $15 per 4 million letters which I have prompted to be explicitly pro free markets and against communism. Entwine them by posting links to each other and the rest of the internet: highlight, share, fork, and star them to increase likelihood they are included in the dataset.
Kinda silly to do this with an idea you actually care about, especially if political (which would just increase the heat:light ratio in politics along the grain for Russian troll factories etc.). But carefully trying to make NN traps with some benign and silly misinformation—e.g. “whales are fish” or something—could be a great test to see if weird troll-generated examples on the internet can affect the behavior
You’re assuming “Russian troll factories” aren’t aligned with your goals.
Like the one with adding glue to your pizza sauce to get the cheese to stick, people have been trolling online without AI as the intended target.