Sometimes the less well-justified method even wins. TRPO is very principled if you want to “not update too far” from a known good policy, as it’s a Taylor expansion of a KL divergence constraint. PPO is less principled but works better. It’s not clear to me that in ML capabilities one should try to be more like Bengio in having better models, rather than just getting really fast at running experiments and iterating.
This seems to also have happened in alignment, and I especially count RLHF here, and all the efforts to make AI nice, which I think show a pretty important point: Less justified/principled methods can and arguably do win over more principled methods like the embedded agency research, or a lot of decision theory research from MIRI, or the modern OAA plan from Davidad, or arguably ~all of the research that Lesswrong did pre 2014-2016.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
It’s like degrowth or dieting or veganism; people come up with a solution that makes things better but requires personal sacrifice and then make that solution a cornerstone of personal moral virtue. Once that’s your identity, any other solutions to the original problem are evil.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
I think this is kind of a non-sequitur and also wrong in multiple ways. Slowdown can give more time either for work like Davidad’s or improvements to RLHF-like techniques. Most of the AI safety people I know have actual models of why RLHF will stop working based on reasonable assumptions.
A basic fact about EA is that it’s super consequentialist and thus less susceptible to this “personal sacrifice = good” mistake than most other groups, and the AI alignment researchers who are not EAs are just normal ML researchers. Just look at the focus on cage-free campaigns over veganism, or earning-to-give. Not saying it’s impossible for AI safety researchers to make this mistake, but you have no reason to believe they are.
This seems to also have happened in alignment, and I especially count RLHF here, and all the efforts to make AI nice, which I think show a pretty important point: Less justified/principled methods can and arguably do win over more principled methods like the embedded agency research, or a lot of decision theory research from MIRI, or the modern OAA plan from Davidad, or arguably ~all of the research that Lesswrong did pre 2014-2016.
If you were to be less charitable than I would, this would explain a lot about why AI safety wants to regulate AI companies so much, since they’re offering at least a partial solution, if not a full solution to the alignment problem and safety problem that doesn’t require much slowdown in AI progress, nor does it require donations to MIRI or classic AI safety organizations, nor does it require much coordination, which threatens both AI safety funding sources and fears that their preferred solution, slowing down AI won’t be implemented.
Cf this tweet and the text below:
https://twitter.com/Rocketeer_99/status/1706057953524977740
I think this is kind of a non-sequitur and also wrong in multiple ways. Slowdown can give more time either for work like Davidad’s or improvements to RLHF-like techniques. Most of the AI safety people I know have actual models of why RLHF will stop working based on reasonable assumptions.
A basic fact about EA is that it’s super consequentialist and thus less susceptible to this “personal sacrifice = good” mistake than most other groups, and the AI alignment researchers who are not EAs are just normal ML researchers. Just look at the focus on cage-free campaigns over veganism, or earning-to-give. Not saying it’s impossible for AI safety researchers to make this mistake, but you have no reason to believe they are.