I appreciate this review of the topic, but feel like you’re overselling the “patchwork solutions”, or not emphasizing their difficulties and downsides enough.
We can infer from the history of human reasoning that human cognition is relatively inefficient at transforming resources into adversarial text-inputs, as people have not produced all that many of those. No such inference can be made for computational search processes generally. We avoid most of the adversarial questions into HCH by remaining in the shallow waters of human cognition, and avoiding at the outset alien search processes like, for example, unconstrained searches for world models fitting parameters.
Are you assuming that there aren’t any adversaries/competitors (e.g., unaligned or differently aligned AIs) outside of the IDA/HCH system? Because suppose there are, then they could run an alien search process to find a message such that it looks innocuous on the surface, but when read/processed by HCH, would trigger an internal part of HCH to produce an adversarial question, even though HCH has avoided doing any alien search processes itself.
Another solution to the adversarial questions problem is to restrict bandwidth between HCH nodes (Saunders 2018).
This involves a bunch of hard problems that are described in Saunders’s post, which you don’t mention here.
They might also be told to return an “unable to safely answer this question” response when fed political or decision-theoretic questions.
In Why do we need a NEW philosophy of progress?, Jason Crawford asked, “How can we make moral and social progress at least as fast as we make scientific, technological and industrial progress? How do we prevent our capabilities from outrunning our wisdom?” (He’s not the only person to worry about differential intellectual progress, just the most recent.) In this world that you envision, do we just have to give up this hope, as “moral and social progress” seem inescapably political, and therefore IDA/HCH won’t be able to offer us help on that front?
Similarly with decision-theoretic questions, what about such questions posed by reality, e.g., presented to us by our adversaries/competitors or potential collaborators? Would we have to answer them without the help of superintelligence?
(That leaves “thought policing”, which I’m not already familiar with. Tried to read Paul’s post on the topic, but don’t have enough time to understand his argument for why the scheme is safe.)
In Why do we need a NEW philosophy of progress?, Jason Crawford asked, “How can we make moral and social progress at least as fast as we make scientific, technological and industrial progress? How do we prevent our capabilities from outrunning our wisdom?” (He’s not the only person to worry about differential intellectual progress, just the most recent.) In this world that you envision, do we just have to give up this hope, as “moral and social progress” seem inescapably political, and therefore IDA/HCH won’t be able to offer us help on that front?
I think so—in my world model, people are just manifestly, hopelessly mindkilled by these domains. In other, apolitical domains, our intelligence can take us far. I’m certain that doing better politically is possible (perhaps even today, with great and unprecedently thoughtful effort and straining against much of what evolution built into us), but as far as bootstrapping up to a second-generation aligned AGI goes, we ought to stick to the kind of research we’re good at if that’ll suffice. Solving politics can come after, with the assistance of yet-more-powerful second-generation aligned AI.
Are you assuming that there aren’t any adversaries/competitors (e.g., unaligned or differently aligned AIs) outside of the IDA/HCH system? Because suppose there are, then they could run an alien search process to find a message such that it looks innocuous on the surface, but when read/processed by HCH, would trigger an internal part of HCH to produce an adversarial question, even though HCH has avoided doing any alien search processes itself.
In the world I was picturing, there aren’t yet AI-assisted adversaries out there who have access into HCH. So I wasn’t expecting HCH to be robust to those kinds of bad actors, just to inputs it might (avoidably) encounter in its own research.
Similarly with decision-theoretic questions, what about such questions posed by reality, e.g., presented to us by our adversaries/competitors or potential collaborators? Would we have to answer them without the help of superintelligence?
Conditional on my envisioned future coming about, the decision theory angle worries me more. Plausibly, we’ll need to know a good bit about decision theory to solve the remainder of alignment (with HCH’s help). My hope is that we can avoid the most dangerous areas of decision theory within HCH while still working out what we need to work out. I think this view was inspired by the way smart rationalists have been able to make substantial progress on decision theory while thinking carefully about potential infohazards and how to avoid encountering them.
What I say here is inadequate, though—really thinking about decision theory in HCH would be a separate project.
I appreciate this review of the topic, but feel like you’re overselling the “patchwork solutions”, or not emphasizing their difficulties and downsides enough.
Are you assuming that there aren’t any adversaries/competitors (e.g., unaligned or differently aligned AIs) outside of the IDA/HCH system? Because suppose there are, then they could run an alien search process to find a message such that it looks innocuous on the surface, but when read/processed by HCH, would trigger an internal part of HCH to produce an adversarial question, even though HCH has avoided doing any alien search processes itself.
This involves a bunch of hard problems that are described in Saunders’s post, which you don’t mention here.
In Why do we need a NEW philosophy of progress?, Jason Crawford asked, “How can we make moral and social progress at least as fast as we make scientific, technological and industrial progress? How do we prevent our capabilities from outrunning our wisdom?” (He’s not the only person to worry about differential intellectual progress, just the most recent.) In this world that you envision, do we just have to give up this hope, as “moral and social progress” seem inescapably political, and therefore IDA/HCH won’t be able to offer us help on that front?
Similarly with decision-theoretic questions, what about such questions posed by reality, e.g., presented to us by our adversaries/competitors or potential collaborators? Would we have to answer them without the help of superintelligence?
(That leaves “thought policing”, which I’m not already familiar with. Tried to read Paul’s post on the topic, but don’t have enough time to understand his argument for why the scheme is safe.)
(Thanks for the feedback!)
I think so—in my world model, people are just manifestly, hopelessly mindkilled by these domains. In other, apolitical domains, our intelligence can take us far. I’m certain that doing better politically is possible (perhaps even today, with great and unprecedently thoughtful effort and straining against much of what evolution built into us), but as far as bootstrapping up to a second-generation aligned AGI goes, we ought to stick to the kind of research we’re good at if that’ll suffice. Solving politics can come after, with the assistance of yet-more-powerful second-generation aligned AI.
In the world I was picturing, there aren’t yet AI-assisted adversaries out there who have access into HCH. So I wasn’t expecting HCH to be robust to those kinds of bad actors, just to inputs it might (avoidably) encounter in its own research.
Conditional on my envisioned future coming about, the decision theory angle worries me more. Plausibly, we’ll need to know a good bit about decision theory to solve the remainder of alignment (with HCH’s help). My hope is that we can avoid the most dangerous areas of decision theory within HCH while still working out what we need to work out. I think this view was inspired by the way smart rationalists have been able to make substantial progress on decision theory while thinking carefully about potential infohazards and how to avoid encountering them.
What I say here is inadequate, though—really thinking about decision theory in HCH would be a separate project.