Yes, I agree. It would be safest to use such “AI bombs” for solving hard problems with short and machine-checkable solutions, like proving math theorems, designing algorithms or breaking crypto. There’s not much point for the AI to insert backdoors into the answer if it only cares about the verifier’s response after a trillion cycles, but the really paranoid programmer may also include a term in the AI’s utility function to favor shorter answers over longer ones.
Yes, I agree. It would be safest to use such “AI bombs” for solving hard problems with short and machine-checkable solutions, like proving math theorems, designing algorithms or breaking crypto. There’s not much point for the AI to insert backdoors into the answer if it only cares about the verifier’s response after a trillion cycles, but the really paranoid programmer may also include a term in the AI’s utility function to favor shorter answers over longer ones.