It might not be possible to “truly comprehend” the AIs advanced meta-meta-ethics and whatever compact algorithm replaces the goal-subgoals tree, but the AI most certainly can provide a code of behavior and prove that following it is a really good idea, much like humans might train pets to provide a variety of useful tasks whose true purpose they can’t comprehend. And it doesn’t seem unreasonable that this code of behavior wouldn’t have the look and feel of an in-depth philosophy of ethics, and have some very very deep and general compression/procedural mechanism that seem very much like things you’d expect from a true and meaningful set of metaethics to humans, even if it did not correspond much to whats going on inside the AI. It also probably wouldn’t accidentally trigger hypocrisy-revulsion in the humans, although the AI seeming to also be following it is just one of many solutions to that and probably not a very likely one.
Friendliness is pretty much an entirely tangential issue and the equivalent depth of explaining it would require the solution to several open questions unless I’m forgetting something right now. (I probably am)
There, question dissolved.
Edit; I ended up commenting in a bunch of places, in this comment tree, so i feel the need to clarify; I consider both side here to be making errors, and ended up seeing to favor the shminux side because thats where I were able to make interesting contributions, and it made some true tangential claims that were argued against and not defended well. I do not agree with the implications for friendliness however; you don’t need to understand something to be able to construct true statements about it or even direct it’s expression powerfully to have properties you can reference but don’t understand either, especially if you have access to external tools.
It might not be possible to “truly comprehend” the AIs advanced meta-meta-ethics and whatever compact algorithm replaces the goal-subgoals tree, but the AI most certainly can provide a code of behavior and prove that following it is a really good idea, much like humans might train pets to provide a variety of useful tasks whose true purpose they can’t comprehend. And it doesn’t seem unreasonable that this code of behavior wouldn’t have the look and feel of an in-depth philosophy of ethics, and have some very very deep and general compression/procedural mechanism that seem very much like things you’d expect from a true and meaningful set of metaethics to humans, even if it did not correspond much to whats going on inside the AI. It also probably wouldn’t accidentally trigger hypocrisy-revulsion in the humans, although the AI seeming to also be following it is just one of many solutions to that and probably not a very likely one.
Friendliness is pretty much an entirely tangential issue and the equivalent depth of explaining it would require the solution to several open questions unless I’m forgetting something right now. (I probably am)
There, question dissolved.
Edit; I ended up commenting in a bunch of places, in this comment tree, so i feel the need to clarify; I consider both side here to be making errors, and ended up seeing to favor the shminux side because thats where I were able to make interesting contributions, and it made some true tangential claims that were argued against and not defended well. I do not agree with the implications for friendliness however; you don’t need to understand something to be able to construct true statements about it or even direct it’s expression powerfully to have properties you can reference but don’t understand either, especially if you have access to external tools.