I’m a generalist manager at the AI safety hub “Monoid” in Moscow. On the side, I do independent research in conceptual AI safety and organize events.
meriton
Karma: 0
Just to double check: You’d rather say that embedded (in embedded agency) is the synonym of “built-in” (like “In 2005 next generation proprietary embedded controller was introduced.”) then “ingrained” (like “Less—is embedded metalanguage: valid CSS will be valid less-program with the same semantics.”), correct?
Wait, but I thought 1 and 2a look the same from a first-person perspective. I mean, I don’t really notice the difference between something happening suddenly and something that’s been happening for a while — until the consequences become “significant” enough for me to notice. In hindsight, sure, one can find differences, but in the moment? Probably not?
I mean, single-single alignment assumes that the operator (human) is happy with the goals their AI is pursuing — not necessarily* with the consequences of how pursuing those goals affects the world around them (especially in a world where other human+AI agents are also pursuing their own goals).
And so, like someone pointed out in a comment above, we might mistake early stages of disempowerment — the kind that eventually leads to undesirable outcomes in the economy/society/etc. — for empowerment. Because from the individual human’s perspective, that is what it feels like.
No?
What am I missing here?
*Unless we assume the AI somewhat “teaches” the human what goals they should want to pursue — from a very non-myopic perspective.