Submission low bandwidth: This is a pretty obvious one, but: Should we release AI x that we’re convinced is aligned?
Submission: Wei Dai wanted to ask about the best future posts. Why not ask about the best past posts as well to see if any major insights were overlooked?
Submission: What would I think about problem X if I had ten years to think about it?
Your treating the low bandwith oracle as an FAI with a bad output cable. You can ask it if another AI is friendly if you trust it to give you the right answer. As there is no obvious way to reward the AI for correct friendliness judgements, you risk running an AI that isn’t friendly, but still meets the reward criteria.
The low bandwidth is to reduce manipulation. Don’t let it control you with a single bit.
Submission low bandwidth: This is a pretty obvious one, but: Should we release AI x that we’re convinced is aligned?
Submission: Wei Dai wanted to ask about the best future posts. Why not ask about the best past posts as well to see if any major insights were overlooked?
Submission: What would I think about problem X if I had ten years to think about it?
Your treating the low bandwith oracle as an FAI with a bad output cable. You can ask it if another AI is friendly if you trust it to give you the right answer. As there is no obvious way to reward the AI for correct friendliness judgements, you risk running an AI that isn’t friendly, but still meets the reward criteria.
The low bandwidth is to reduce manipulation. Don’t let it control you with a single bit.