Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.
These questions are in part an attempt to set some kind of bar for an AGI+ to pass towards at least showing it’s not obviously misaligned. The results will either be it obviously failed, or it gave us sufficiently reasonable answers plus explanations that it “might have passed.”
The other reason for these questions is that I plan to use them to test an “ethics calculator” I’m working on that I believe could help with development of aligned AGI+.
(By the way, I’m not sure that we’ll ever get nearly all humans to agree on what “aligned” actually looks like/means. “What do you mean it won’t do what I want?!? How is that ‘aligned’?! Aligned with what?!”)
Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.
These questions are in part an attempt to set some kind of bar for an AGI+ to pass towards at least showing it’s not obviously misaligned. The results will either be it obviously failed, or it gave us sufficiently reasonable answers plus explanations that it “might have passed.”
The other reason for these questions is that I plan to use them to test an “ethics calculator” I’m working on that I believe could help with development of aligned AGI+.
(By the way, I’m not sure that we’ll ever get nearly all humans to agree on what “aligned” actually looks like/means. “What do you mean it won’t do what I want?!? How is that ‘aligned’?! Aligned with what?!”)