baturinsky comments on Clarifying mesa-optimization

baturinsky 21 Mar 2023 16:27 UTC
0 points
0
Can we instill heuristics into AI to lock down some dangerous routes of thinking? For example, can we make it assume that “thinking about microbiology or nanotech do not lead to anything interesting” or “if I make a copy of me it will be hostile to me”?