Here how you can talk about bombing unlicensed datacenters without using “strike” and “bombing”.
If we can probe the thoughts and motivations of an AI and discover wow, actually GPT-6 is planning to takeover the world if it ever gets the chance. That would be an incredibly valuable thing for governments to coordinate around because it would remove a lot of the uncertainty, it would be easier to agree that this was important, to have more give on other dimensions and to have mutual trust that the other side actually also cares about this because you can’t always know what another person or another government is thinking but you can see the objective situation in which they’re deciding. So if there’s strong evidence in a world where there is high risk of that risk because we’ve been able to show actually things like the intentional planning of AIs to do a takeover or being able to show model situations on a smaller scale of that I mean not only are we more motivated to prevent it but we update to think the other side is more likely to cooperate with us and so it’s doubly beneficial.
Here is alternative to dangerous experiments to develop enhanced cognition in humans. Sounds less extreme and little more doable.
Just going from less than 1% of the effort being put into AI to 5% or 10% of the effort or 50% or 90% would be an absolutely massive increase in the amount of work that has been done on alignment, on mind reading AIs in an adversarial context.
If it’s the case that as more and more of this work can be automated and say governments require that you put 50% or 90% of the budget of AI activity into these problems of make this system one that’s not going to overthrow our own government or is not going to destroy the human species then the proportional increase in alignment can be very large even just within the range of what we could have done if we had been on the ball and having humanity’s scientific energies going into the problem. Stuff that is not incomprehensible, that is in some sense is just doing the obvious things that we should have done.
Also pretty bizarre that in response to
Dwarkesh Patel 02:18:27
So how do we make sure it’s not the thing it learns is not to manipulate us into rewarding it when we catch it not lying but rather to universally be aligned.
Carl Shulman 02:18:41
Yeah, so this is tricky. Geoff Hinton was recently saying there is currently no known solution for this.
The answer was: yes, but we are doing it anyway. But with a twists like adversarial examples, adversarial training and simulations. If Shulman had THE ANSWER to Alignment problem then he would not kept it secret, but i cant help but feel some disappointment, because he sounds SO hopeful and confident. I somehow expected something different than variation of “we are going to us weaker AIs to help us to align stronger AIs while trying to outrun capabilities research teams”. Even if this variation (in his description) seems very sophisticated with mind reading and inducing hallucinations.
The thing was already an obscene 7 hours with a focus on intelligence explosion and mechanics of AI takeover (which are under-discussed in the discourse and easy to improve on, so I wanted to get concrete details out). More detail on alignment plans and human-AI joint societies are planned focus areas for the next times I do podcasts.
Here how you can talk about bombing unlicensed datacenters without using “strike” and “bombing”.
I think it’s pretty easy to talk about bombing without saying “bombing”, it’s just… less clear. (depending on how you do it and how sustained it is, it feels orwellian and dishonest. I think Carl’s phrasing here is fine but I do want someone somewhere being clear about what’d be required)
(It seems plausibly an actually-good strategy to have Eliezer off saying extreme/clear things and moving the overton window while other people say reasonable-at-first-glance sounding things)
Here how you can talk about bombing unlicensed datacenters without using “strike” and “bombing”.
Here is alternative to dangerous experiments to develop enhanced cognition in humans. Sounds less extreme and little more doable.
Also pretty bizarre that in response to
The answer was: yes, but we are doing it anyway. But with a twists like adversarial examples, adversarial training and simulations. If Shulman had THE ANSWER to Alignment problem then he would not kept it secret, but i cant help but feel some disappointment, because he sounds SO hopeful and confident. I somehow expected something different than variation of “we are going to us weaker AIs to help us to align stronger AIs while trying to outrun capabilities research teams”. Even if this variation (in his description) seems very sophisticated with mind reading and inducing hallucinations.
The thing was already an obscene 7 hours with a focus on intelligence explosion and mechanics of AI takeover (which are under-discussed in the discourse and easy to improve on, so I wanted to get concrete details out). More detail on alignment plans and human-AI joint societies are planned focus areas for the next times I do podcasts.
I think it’s pretty easy to talk about bombing without saying “bombing”, it’s just… less clear. (depending on how you do it and how sustained it is, it feels orwellian and dishonest. I think Carl’s phrasing here is fine but I do want someone somewhere being clear about what’d be required)
(It seems plausibly an actually-good strategy to have Eliezer off saying extreme/clear things and moving the overton window while other people say reasonable-at-first-glance sounding things)