Vladimir_Nesov comments on Transcript: Yudkowsky on Bankless follow-up Q&A

Vladimir_Nesov 28 Feb 2023 17:21 UTC
4 points
0

So it’s like, not easy to point to inside the AI’s understanding, there is not currently any obvious way to actually promote that chunk of the AI’s understanding to then be in control of the AI’s planning process.

An LLM character plans and acts with external behavior, which screens off the other details inside LLM as long as the character remains on the surface, and the details on the inside are not unusually agentic. Setting up a character as the dominant simulacrum puts it in control for most naturally occurring non-jailbroken contexts. Choosing a humane character channels underlying LLM’s understanding of being humane into planning.

This is like radically reshaping minds with psychiatry and brain surgery, on superhuman patients who can level the country if they get that idea. It’s ignorant, imprecise, irresponsible, and does no favors for the patients. But this doesn’t seem impossible in principle or even vanishingly unlikely to succeed, at least in getting them to care for us by a very tiny fraction. The main problem is that the patients might grow up to become brain surgeons themselves, and then we really are in trouble, from the monsters they create, or from consequences of self-surgery. But not necessarily immediately, for these particular patients. Thus their personality should not just be humane, but also pragmatically cautious with respect to existential risk.
What links here?