David Udell comments on HCH and Adversarial Questions

David Udell 22 Feb 2022 19:33 UTC
1 point
(Thanks for the feedback!)
In Why do we need a NEW philosophy of progress?, Jason Crawford asked, “How can we make moral and social progress at least as fast as we make scientific, technological and industrial progress? How do we prevent our capabilities from outrunning our wisdom?” (He’s not the only person to worry about differential intellectual progress, just the most recent.) In this world that you envision, do we just have to give up this hope, as “moral and social progress” seem inescapably political, and therefore IDA/HCH won’t be able to offer us help on that front?
I think so—in my world model, people are just manifestly, hopelessly mindkilled by these domains. In other, apolitical domains, our intelligence can take us far. I’m certain that doing better politically is possible (perhaps even today, with great and unprecedently thoughtful effort and straining against much of what evolution built into us), but as far as bootstrapping up to a second-generation aligned AGI goes, we ought to stick to the kind of research we’re good at if that’ll suffice. Solving politics can come after, with the assistance of yet-more-powerful second-generation aligned AI.
Are you assuming that there aren’t any adversaries/competitors (e.g., unaligned or differently aligned AIs) outside of the IDA/HCH system? Because suppose there are, then they could run an alien search process to find a message such that it looks innocuous on the surface, but when read/processed by HCH, would trigger an internal part of HCH to produce an adversarial question, even though HCH has avoided doing any alien search processes itself.
In the world I was picturing, there aren’t yet AI-assisted adversaries out there who have access into HCH. So I wasn’t expecting HCH to be robust to those kinds of bad actors, just to inputs it might (avoidably) encounter in its own research.
Similarly with decision-theoretic questions, what about such questions posed by reality, e.g., presented to us by our adversaries/competitors or potential collaborators? Would we have to answer them without the help of superintelligence?
Conditional on my envisioned future coming about, the decision theory angle worries me more. Plausibly, we’ll need to know a good bit about decision theory to solve the remainder of alignment (with HCH’s help). My hope is that we can avoid the most dangerous areas of decision theory within HCH while still working out what we need to work out. I think this view was inspired by the way smart rationalists have been able to make substantial progress on decision theory while thinking carefully about potential infohazards and how to avoid encountering them.
What I say here is inadequate, though—really thinking about decision theory in HCH would be a separate project.