solid try, but only #12 seems plausible by my (intuitive partially trained) model of proof systems and ai, and to do it, you would have to formalize what you mean by helpful, harmless, and honest. You’d probably have to use formalizations that aren’t as nice as you’d like; if you want to investigate it, it’s the only one of these that even vaguely seems actually approachable as a real research plan. I’ve strong downvoted because so many of these are misleadingly useless; yudkowsky not knowing about it probably in my view because the ideas that work are rare and he’s stuck on trying to get the best ones and can’t improve from what we’ve got today. Your #12 could use what we’ve got today, if interested, here are some possible starting places. You’ll have to be much better than me at formal reasoning to actually improve SOTA, all I’m good at is finding stuff, I’m librarian character class and only mediocre at it.
solid try, but only #12 seems plausible by my (intuitive partially trained) model of proof systems and ai, and to do it, you would have to formalize what you mean by helpful, harmless, and honest. You’d probably have to use formalizations that aren’t as nice as you’d like; if you want to investigate it, it’s the only one of these that even vaguely seems actually approachable as a real research plan. I’ve strong downvoted because so many of these are misleadingly useless; yudkowsky not knowing about it probably in my view because the ideas that work are rare and he’s stuck on trying to get the best ones and can’t improve from what we’ve got today. Your #12 could use what we’ve got today, if interested, here are some possible starting places. You’ll have to be much better than me at formal reasoning to actually improve SOTA, all I’m good at is finding stuff, I’m librarian character class and only mediocre at it.
https://www.lesswrong.com/posts/ozojWweCsa3o32RLZ/list-of-links-formal-methods-embedded-agency-3d-world-models#Formal_Training
Interesting that you thought #12 is the easiest to formalize. I would have guessed #14 personally.