I have always believed we have to have some movement towards the AI’s goals in order for it to be willing to move towards our goals, almost no matter what architecture we land on; we should aim for architectures that ensure that the tug we get from ai will be acceptable and help us against dangerous AI. We will never have total control over what AI wants, and that’s fine; we need not, in order to ensure that the desired worldlines can be trajectories that coexist comfortably. I don’t mind if we make paperclips, as long as no humans or ai are lost in the process, and we don’t spend too much energy on paperclips compared to things much more interesting than paperclips.
I mean, I think we could make some pretty dang cool paperclips if we put our collective minds to it and make sure everyone participating is safe, well fed, fully health protected, and backed up in case of body damage.
Question is, what can such primitive species like us could offer to AI.
Best I could come with is “predictability”. We people have relatively stable and “documented” “architecture”, so as long as civilization is build upon us, AI can more or less safely predict consequences of it’s actions and plan with high degree of certainty. But if this society collapses, destroyed, or is radically changed, AI will have to deal with a very unpredictable situation and with other AIs that who knows how would act in that situation.
We also need to uplift ourselves in terms of thermal performance. AI is going to be very picky about us not overheating stuff; but if we have a wide enough window of coprotective alignment, we can use that to upgrade our bodies to run orders of magnitude cooler while otherwise being the same. It will take a very strong, very aligned AI to pull off such a thing, but physics permits it.
in other words, we aren’t just offering them something in trade. In an agentically coprotective outcome, you’re not just trading objects, you’re trading valuing each other’s values. The AI takes on valuing humans intrinsically, and humans take on valuing the AI intrinsically.
to be more specific—I think there’s some form of “values cooperation” that I’ve been variously calling things like “agentic co-protection”; you want what you want, I want what I want, and there’s some set of possible worldlines where we both get a meaningful and acceptable amount of what we each want. If we nail alignment, then we get a good outcome and everyone gets plenty, including some ais getting to go off and make some of whatever their personal paperclips are. but if we get a good outcome, then that agentic preference on behalf of the ai is art, art which can be shared and valued by other beings as well.
I have always believed we have to have some movement towards the AI’s goals in order for it to be willing to move towards our goals, almost no matter what architecture we land on; we should aim for architectures that ensure that the tug we get from ai will be acceptable and help us against dangerous AI. We will never have total control over what AI wants, and that’s fine; we need not, in order to ensure that the desired worldlines can be trajectories that coexist comfortably. I don’t mind if we make paperclips, as long as no humans or ai are lost in the process, and we don’t spend too much energy on paperclips compared to things much more interesting than paperclips.
I mean, I think we could make some pretty dang cool paperclips if we put our collective minds to it and make sure everyone participating is safe, well fed, fully health protected, and backed up in case of body damage.
Question is, what can such primitive species like us could offer to AI.
Best I could come with is “predictability”. We people have relatively stable and “documented” “architecture”, so as long as civilization is build upon us, AI can more or less safely predict consequences of it’s actions and plan with high degree of certainty. But if this society collapses, destroyed, or is radically changed, AI will have to deal with a very unpredictable situation and with other AIs that who knows how would act in that situation.
We need to become able to trade with ants.
We also need to uplift ourselves in terms of thermal performance. AI is going to be very picky about us not overheating stuff; but if we have a wide enough window of coprotective alignment, we can use that to upgrade our bodies to run orders of magnitude cooler while otherwise being the same. It will take a very strong, very aligned AI to pull off such a thing, but physics permits it.
in other words, we aren’t just offering them something in trade. In an agentically coprotective outcome, you’re not just trading objects, you’re trading valuing each other’s values. The AI takes on valuing humans intrinsically, and humans take on valuing the AI intrinsically.
to be more specific—I think there’s some form of “values cooperation” that I’ve been variously calling things like “agentic co-protection”; you want what you want, I want what I want, and there’s some set of possible worldlines where we both get a meaningful and acceptable amount of what we each want. If we nail alignment, then we get a good outcome and everyone gets plenty, including some ais getting to go off and make some of whatever their personal paperclips are. but if we get a good outcome, then that agentic preference on behalf of the ai is art, art which can be shared and valued by other beings as well.