Let’s call the thing where you try to take actions that make everyone/yourself less dead (on expectation) the “safety game”. This game is annoyingly chaotic, kind of like Arimaa.
You write the sequences then some risk-averse not-very-power-seeking nerds read it and you’re 10x less dead. Then Mr. Altman reads it and you’re 10x more dead. Then maybe (or not) there’s a backlash and the numbers change again.
You start a cute political movement but the countermovement ends up being 10x more actionable (e/acc).
You try to figure out and explain some of the black box but your explanation is immediately used to make a stronger black box. (Mamba possibly.)
I’m curious what folks use as toeholds for making decisions in such circumstances. Or if some folks believe there are actually principles then I would like to hear them, but I suspect the fog is too thick. I’ll skip giving my own answer on this one.
[Question] Any real toeholds for making practical decisions regarding AI safety?
Let’s call the thing where you try to take actions that make everyone/yourself less dead (on expectation) the “safety game”. This game is annoyingly chaotic, kind of like Arimaa.
You write the sequences then some risk-averse not-very-power-seeking nerds read it and you’re 10x less dead. Then Mr. Altman reads it and you’re 10x more dead. Then maybe (or not) there’s a backlash and the numbers change again.
You start a cute political movement but the countermovement ends up being 10x more actionable (e/acc).
You try to figure out and explain some of the black box but your explanation is immediately used to make a stronger black box. (Mamba possibly.)
Etc.
I’m curious what folks use as toeholds for making decisions in such circumstances. Or if some folks believe there are actually principles then I would like to hear them, but I suspect the fog is too thick. I’ll skip giving my own answer on this one.