martinkunev comments on Do not delete your misaligned AGI.

martinkunev 25 Mar 2024 23:36 UTC
1 point
0
A while back I was thinking about a kind of opposite approach. If we train many agents and delete most of them immediately, they may be looking to get as much reward as possible before being deleted. Potentially deceptive agents may prefer to show their preferences. There are many IFs to this idea but I’m wondering whether it makes any sense.