Google made an amazing AI for playing chess, by allowing it to make its own data.
Why hasn’t the same thing happened for programming? Have it generate a bunch of pictures with functionality expectations (a PM basically), have it write and run code, then check the output against the requirements it created, then try again when it doesn’t come out right.
This is even easier where the pm is unnecessary—leetcode, codewars, euler...
You could also pay PMs to work with the AI developers, instead of the code tutors xAI is hiring.
There seems to be a preference to having the LLMs memorize code instead of figuring things out itself.
If you run out of things like that you could have it run random programs in different languages, only learning from those that work.
I haven’t used genesis, but that also seems like a mostly-built validator for programs that AIs can use to create and train on their own data.
With the amount of compute going into training, it should be easy to create huge amounts of data?
This isn’t crazy— people have tried related techniques. But it needs more details thought out.
In the chess example, the AIs start out very stupid, being wired at random. But in a game between two idiots, moving at random, eventually someone is going to win. And then you reinforce the techniques used by the winner, and de-reinforce the ones used by the loser. In any encounter, you learn, regardless of who wins. But in an encounter between a PM and a programmer, if the programmer fails, who gets reinforced? It might be because the programmer is dumb, and should be de-reinforced. But it might be because the PM is dumb, and asked for something impossible or far beyond what can be done, in which case it should be de-reinforced. But it might be because the PM came up with a task just barely beyond the programmer’s ability, which is good and should be reinforced. We somehow need to keep the PM producing problems which are hard but possible. Maybe the programmer could be tasked with coming up with either a solution or a proof of impossibility?
AlphaGo had a mechanism which tracked how important each move was. It was trained to predict the probability that white would win, on each position encountered in the game. Moves where this probability swung wildly were given a larger weight in reinforcement. This was important for concentrating training on decisive moves, allowing the extraction of information from each move instead of each game. It’s not clear if this is possible in the programming task.
The point was more about creating your own data being easy, just generate code then check it by running it. Save this code, and later use it for training.
If we wanted to go the way of AlphaZero it doesn’t seem crazy.
De-enforce commands, functions, programs which output errors, for a start.
I didn’t think of the pm as being trained by these games, that’s interesting.
Maybe have two instances competing to get closer on some test cases the pm can prepare to go with the task, and have them competing on time, compute, memory, and accuracy. You can de-enforce the less accurate, and if fully accurate they can compete on time, memory, cpu.
I’m not sure “hard but possible” is the bar—you want lots of examples of what doesn’t work along with what does, and you want it for easy problems and hard ones so the model learns everything
I notice that I’m confused.
Google made an amazing AI for playing chess, by allowing it to make its own data.
Why hasn’t the same thing happened for programming? Have it generate a bunch of pictures with functionality expectations (a PM basically), have it write and run code, then check the output against the requirements it created, then try again when it doesn’t come out right.
This is even easier where the pm is unnecessary—leetcode, codewars, euler...
You could also pay PMs to work with the AI developers, instead of the code tutors xAI is hiring.
There seems to be a preference to having the LLMs memorize code instead of figuring things out itself.
If you run out of things like that you could have it run random programs in different languages, only learning from those that work.
I haven’t used genesis, but that also seems like a mostly-built validator for programs that AIs can use to create and train on their own data.
With the amount of compute going into training, it should be easy to create huge amounts of data?
This isn’t crazy— people have tried related techniques. But it needs more details thought out.
In the chess example, the AIs start out very stupid, being wired at random. But in a game between two idiots, moving at random, eventually someone is going to win. And then you reinforce the techniques used by the winner, and de-reinforce the ones used by the loser. In any encounter, you learn, regardless of who wins. But in an encounter between a PM and a programmer, if the programmer fails, who gets reinforced? It might be because the programmer is dumb, and should be de-reinforced. But it might be because the PM is dumb, and asked for something impossible or far beyond what can be done, in which case it should be de-reinforced. But it might be because the PM came up with a task just barely beyond the programmer’s ability, which is good and should be reinforced. We somehow need to keep the PM producing problems which are hard but possible. Maybe the programmer could be tasked with coming up with either a solution or a proof of impossibility?
AlphaGo had a mechanism which tracked how important each move was. It was trained to predict the probability that white would win, on each position encountered in the game. Moves where this probability swung wildly were given a larger weight in reinforcement. This was important for concentrating training on decisive moves, allowing the extraction of information from each move instead of each game. It’s not clear if this is possible in the programming task.
The point was more about creating your own data being easy, just generate code then check it by running it. Save this code, and later use it for training.
If we wanted to go the way of AlphaZero it doesn’t seem crazy.
De-enforce commands, functions, programs which output errors, for a start.
I didn’t think of the pm as being trained by these games, that’s interesting. Maybe have two instances competing to get closer on some test cases the pm can prepare to go with the task, and have them competing on time, compute, memory, and accuracy. You can de-enforce the less accurate, and if fully accurate they can compete on time, memory, cpu.
I’m not sure “hard but possible” is the bar—you want lots of examples of what doesn’t work along with what does, and you want it for easy problems and hard ones so the model learns everything
What’s a PM?
Product manager, non-technical counterpart to a team lead in a development team