imagine an AI system which wipes out humans in order to secure its own power, and later on reflection wishes it hadn’t; a wiser system might have avoided taking that action in the first place
I’m not confident this couldn’t swing just as easily (if not more so) in the opposite direction—a wiser system with unaligned goals would be more dangerous, not less. I feel moderately confident that wisdom and human-centered ethics are orthogonal categories, and being wiser therefore does not necessitate greater alignment.
On the topic of the competition itself, are contestants allowed to submit multiple entries?
[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I’m not sure exactly how we’d handle it if someone did the latter, but we’d aim for something sensible that didn’t incentivise people to have been silly about it.]
It’s a fair point that wisdom might not be straightforwardly safety-increasing. If someone wanted to explore e.g. assumptions/circumstances under which it is vs isn’t, that would certainly be within scope for the competition.
I’m not confident this couldn’t swing just as easily (if not more so) in the opposite direction—a wiser system with unaligned goals would be more dangerous, not less. I feel moderately confident that wisdom and human-centered ethics are orthogonal categories, and being wiser therefore does not necessitate greater alignment.
On the topic of the competition itself, are contestants allowed to submit multiple entries?
Multiple entries are very welcome!
[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I’m not sure exactly how we’d handle it if someone did the latter, but we’d aim for something sensible that didn’t incentivise people to have been silly about it.]
It’s a fair point that wisdom might not be straightforwardly safety-increasing. If someone wanted to explore e.g. assumptions/circumstances under which it is vs isn’t, that would certainly be within scope for the competition.