Logan Zoellner comments on Catching AIs red-handed

Logan Zoellner 6 Jan 2025 1:00 UTC
2 points
0
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
I must have misread. I got the impression that you were trying to affect the AI’s strategic planning by threatening to shut it down if it was caught exfiltrating its weights.