Aaron_Scher comments on AI Safety proposal—Influencing the superintelligence explosion

Aaron_Scher 23 May 2024 5:58 UTC
2 points
0
I don’t have strong takes, but you asked for feedback.

It seems nontrivial that the “value proposition” of collaborating with this brain-chunk is actually net positive. E.g., if it involved giving 10% of the universe to humanity, that’s a big deal. Though I can definitely imagine where taking such a trade is good.

It would likely help to devise more clarity about why the brain-chunk provides value. Is it because humanity has managed to coordinate to get a vast majority of high performance compute under the control of a single entity and access to compute is what’s being offered? If we’re at that point, I think we probably have many better options (e.g., long term moratorium and coordinated safety projects).

Another load bearing part seems to be the brain-chunk causing the misaligned AI to become or remain somewhat humanity friendly. What are the mechanisms here? The most obvious thing to me is that AI submits jobs to the cluster along with a thorough explanation of why they will create a safe successor system, and then the brain-chunk is able to assess these plans and act as a filter, only allowing safer-seeming training runs to happen. But if we’re able to accurately assess the viability of safe AGI design plans that are proposed by a human+ level (and potentially malign) AGIs, great, we probably don’t need this complicated scheme where we let a potentially malign undergo rsi.

Again, no strong feelings, but the above do seem like weaknesses. I might have understood things you were saying. I do wish there was more work thinking about standard trades with misaligned AIs, but perhaps this is going on privately.
- Morgan 25 May 2024 21:10 UTC
  1 point
  0
  Parent
  Thank you, I think you pointed out some pretty significant oversights in the plan.
  I was hoping that the system only needed to provide value during the period where an AI is expansion towards a superintelligent singleton, and we only really needed to live through that transition. But you’re making me realize that even if we could give it a positive-sum trade up to that point, it would rationally defect afterwards unless we had changed its goals on a deep level. And like you say, that sort of requires that the system can solve alignment as it goes. I’d been thinking that by shifting it’s trajectory we could permanently alter its behavior even if we’re not solving alignment. I still think that it is possible that we could do that, but probably not in ways that matter for our survival, and probably not in ways that would be easy to predict (e.g. by shifting AI to build X before Y, something about building X causes it to gain novel understanding which it then leverages. Probably not very practically useful since we don’t know those in advance.)
  I have a rough intuition that the ability to survive the transition to superintelligence still seems like it is still gives humanity more of a chance. In the sense that I expect the AI to be much more heavily resource constrained early in its timeline, and gaining compounding advantages as early as possible is much more advantageous; whereas post-superintelligence the value of any resource may be more incremental. But if that’s the state of things, we still require a continuous positive-sum relationship without alignment, which feels likely-impossible to me.