Hmmm. I’m not exactly sure what the disconnect is, but I don’t think you’re quite understanding my model.
I think anti-slop research is very probably dual-use. I expect it to accelerate capabilities. However, I think attempting to put “capabilities” and “safety” on the same scale and maximize differential progress of safety over capabilities is an oversimplistic model which doesn’t capture some important dynamics.
There is not really a precise “finish line”. Rather, we can point to various important events. The extinction of all humans lies down a path where many mistakes (of varying sorts and magnitudes) were made earlier.
Anti-slop AI helps everybody make less mistakes. Sloppy AI convinces lots of people to make more mistakes.
My assumption is that frontier labs are racing ahead anyway. The idea is that we’d rather they race ahead with a less-sloppy approach.
Imagine an incautious teenager who is running around all the time and liable to run off a cliff. You expect that if they run off a cliff, they die—at this rate you expect such a thing to happen sooner or later. You can give them magic sneakers that allow them to run faster, but also improves their reaction time, their perception of obstacles, and even their wisdom. Do you give the kid the shoes?
It’s a tough call. Giving the kid the shoes might make them run off a cliff even faster than they otherwise would. It could also allow them to stop just short of the cliff when they otherwise wouldn’t.
I think if you value increased P(they survive to adulthood) over increased E(time they spend as a teenager), you give them the shoes. IE, withholding the shoes values short-term over long-term. If you think there’s no chance of survival to adulthood either way, you don’t hand over the shoes.
Hmmm. I’m not exactly sure what the disconnect is, but I don’t think you’re quite understanding my model.
I think anti-slop research is very probably dual-use. I expect it to accelerate capabilities. However, I think attempting to put “capabilities” and “safety” on the same scale and maximize differential progress of safety over capabilities is an oversimplistic model which doesn’t capture some important dynamics.
There is not really a precise “finish line”. Rather, we can point to various important events. The extinction of all humans lies down a path where many mistakes (of varying sorts and magnitudes) were made earlier.
Anti-slop AI helps everybody make less mistakes. Sloppy AI convinces lots of people to make more mistakes.
My assumption is that frontier labs are racing ahead anyway. The idea is that we’d rather they race ahead with a less-sloppy approach.
Imagine an incautious teenager who is running around all the time and liable to run off a cliff. You expect that if they run off a cliff, they die—at this rate you expect such a thing to happen sooner or later. You can give them magic sneakers that allow them to run faster, but also improves their reaction time, their perception of obstacles, and even their wisdom. Do you give the kid the shoes?
It’s a tough call. Giving the kid the shoes might make them run off a cliff even faster than they otherwise would. It could also allow them to stop just short of the cliff when they otherwise wouldn’t.
I think if you value increased P(they survive to adulthood) over increased E(time they spend as a teenager), you give them the shoes. IE, withholding the shoes values short-term over long-term. If you think there’s no chance of survival to adulthood either way, you don’t hand over the shoes.