I suppose I should have said they ought to make a reasonable attempt. Attempting and failing should be enough to make you worth cooperating with.
Even if that attempt has ~0% chance of working, and a good chance of making the AI unaligned?
Isn’t this assumption the basis of superrationality though? That would be a useless concept if it wasn’t possible for AGIs to prove things about their own reasoning to one another.
Physisists often assume friction-less spheres in a vacuum. Its not that other things don’t exist, just that the physisist isn’t studying them at the moment. Superrationality explains how agents should behave with mutual knowledge of each others source code. Is there a more general theory, for how agents should behave when they have some limited evidence about each others source code? Such theory isn’t well understood yet. It isn’t the assumption that all agents know each others source code (which is blatantly false in general, whether or not it is true between superintelligences able to exchange nanotech spaceprobes. )Its just the decision to study agents that know each others source code as an interesting special case.
That’s because I was talking about the naturally evolved alien civilization at that point rather than the AGI they create. Assuming I’m right that these tend to be highly social species, they probably have emotional reactions vaguely like our own, so “think xyz person is an asshole” is a sort of thing they’d do, and they’d have a predictably negative reaction to that regardless of the rational response, the same way humans would.
The human emotional response vaguely resembles TDT type reasoning. I would expect alien evolved responses to resemble TDT about as much, in a totally different direction. In the sense that once you know TDT, learning about humans tells you nothing about aliens. Evolution produces somewhat inaccurate maps of the TDT territory. I don’t expect the same inaccuracies to appear on both maps.
Given all this: do you think something vaguely like this idea is salvageable? Is there some story where we communicate something to other civilizations, and it somehow increases our chances of survival now, which would seem plausible to you?
I don’t know. I mean any story where we receive a signal from aliens, that signal could well be helpful, or harmful.
We could just broadcast our values into space, and hope the aliens are nice. (Knowing full well that the signals would also help evil aliens be evil.)
Even if that attempt has ~0% chance of working, and a good chance of making the AI unaligned?
Physisists often assume friction-less spheres in a vacuum. Its not that other things don’t exist, just that the physisist isn’t studying them at the moment. Superrationality explains how agents should behave with mutual knowledge of each others source code. Is there a more general theory, for how agents should behave when they have some limited evidence about each others source code? Such theory isn’t well understood yet. It isn’t the assumption that all agents know each others source code (which is blatantly false in general, whether or not it is true between superintelligences able to exchange nanotech spaceprobes. )Its just the decision to study agents that know each others source code as an interesting special case.
The human emotional response vaguely resembles TDT type reasoning. I would expect alien evolved responses to resemble TDT about as much, in a totally different direction. In the sense that once you know TDT, learning about humans tells you nothing about aliens. Evolution produces somewhat inaccurate maps of the TDT territory. I don’t expect the same inaccuracies to appear on both maps.
I don’t know. I mean any story where we receive a signal from aliens, that signal could well be helpful, or harmful.
We could just broadcast our values into space, and hope the aliens are nice. (Knowing full well that the signals would also help evil aliens be evil.)