Like, I may not want to become a cyborg if I stop being me, but that’s a separate concern from whether it’s bad for alignment (if the resulting cyborg is still aligned).
I think “sufficiently” is doing a lot of work here. For example, are we talking about >99% chance that it kills <1% of humanity, or >50% chance that it kills <50% of humanity?
I also don’t think “something in the middle” is the right characterization; I think “something else” it more accurate. I think that the failure you’re pointing at will look less like a power struggle or akrasia and more like an emergent goal structure that wasn’t really present in either part.
I also think that “cyborg alignment” is in many ways a much more tractable problem than “AI alignment” (and in some ways even less tractable, because of pesky human psychology):
It’s a much more gradual problem; a misaligned cyborg (with no agentic AI components) is not directly capable of FOOM (Amdhal’s law was mentioned elsewhere in the comments as a limit on usefulness of cyborgism, but it’s also a limit on damage)
It has been studied longer and has existed longer; all technologies have influenced human thought
It also may be an important paradigm to study (even if we don’t actively create tools for it) because it’s already happening.
Like, I may not want to become a cyborg if I stop being me, but that’s a separate concern from whether it’s bad for alignment (if the resulting cyborg is still aligned).
Oh I see. I was getting at the “it’s not aligned” bit.
Basically, it seems like if I become a cyborg without understanding what I’m doing, the result is either:
I’m in control
The machine part is in control
Something in the middle
Only the first one seems likely to be sufficiently aligned.
I think “sufficiently” is doing a lot of work here. For example, are we talking about >99% chance that it kills <1% of humanity, or >50% chance that it kills <50% of humanity?
I also don’t think “something in the middle” is the right characterization; I think “something else” it more accurate. I think that the failure you’re pointing at will look less like a power struggle or akrasia and more like an emergent goal structure that wasn’t really present in either part.
I also think that “cyborg alignment” is in many ways a much more tractable problem than “AI alignment” (and in some ways even less tractable, because of pesky human psychology):
It’s a much more gradual problem; a misaligned cyborg (with no agentic AI components) is not directly capable of FOOM (Amdhal’s law was mentioned elsewhere in the comments as a limit on usefulness of cyborgism, but it’s also a limit on damage)
It has been studied longer and has existed longer; all technologies have influenced human thought
It also may be an important paradigm to study (even if we don’t actively create tools for it) because it’s already happening.