Thanks, this is helpful!
AmberDawn
Should Effective Altruists be Valuists instead of utilitarians?
AGI Timelines in Governance: Different Strategies for Different Timeframes
Two reasons we might be closer to solving alignment than it seems
How and why to turn everything into audio
Four reasons I find AI safety emotionally compelling
Thanks! This is interesting.
Thanks!
I think my question is deeper—why do machines ‘want’ or ‘have a goal to’ follow the algorithm to maximize reward? How can machines ‘find stuff rewarding’?
This might be a crux, because I’m inclined to think they depend on qualia.
Why does AI ‘behave’ in that way? How do engineers make it ‘want’ to do things?
My comment-box got glitchy but just to add: this category of intervention might be a good thing to do for people who care about AI safety and don’t have ML/programming skills, but do have people skills/comms skills/political skills/etc.
Maybe lots of people are indeed working on this sort of thing, I’ve just heard much less discussion of this kind of solution relative to technical solutions.
Yudkowksy writes in his AGI Ruin post:
“We can’t just “decide not to build AGI” because GPUs are everywhere...”
Is anyone thinking seriously about how we might bring it about such that we coordinate globally to not build AGI (at least until we’re confident we can do so safely)? If so, who? If not, why not? It seems like something we should at least try to do, especially if the situation is as dire as Yudkowsky thinks. The sort of thing I’m thinking of is (and this touches on points others have made in their questions):
international governance/regulation
start a protest movement against building AI
do lots of research and thinking about rhetoric and communication and diplomacy, find some extremely charming and charismatic people to work on this, and send them to persuade all actors capable of building AGI to not do it (and to do everything they can to prevent others from doing it)
as someone suggested in another question, translate good materials on why people are concerned about AI safety into Mandarin and other languages
more popularising of AI concerns in English
To be clear, I’m not claiming that this will be easy—this is not a “why don’t we just-” point. I agree with the things Yudkowsky says in that paragraph about why it would be difficult. I’m just saying that it’s not obvious to me that this is fundamentally intractable or harder than solving the technical alignment problem. Reasons for relative optimism:
we seem to achieved some international cooperation around nuclear weapons—isn’t it theoretically possible to do so around AGI?
there are lots of actors who could build AGIs, but it’s still a limited number. Larger groups of actors do cooperate.
through negotiation and diplomacy, people successfully persuade other people to do stuff that’s not even in their interest. AI safety should be a much easier sell because if developing AGI is really dangerous, it’s in everyone’s interest to stop developing it. There are coordination problems to be sure, but the fact remains that the AI safety ‘message’ is fundamentally ‘if you stop doing this we won’t all die’
This is very basic/fundamental compared to many questions in this thread, but I am taking ‘all dumb questions allowed’ hyper-literally, lol. I have little technical background and though I’ve absorbed some stuff about AI safety by osmosis, I’ve only recently been trying to dig deeper into it (and there’s lots of basic/fundamental texts I haven’t read).
Writers on AGI often talk about AGI in anthropomorphic terms—they talk about it having ‘goals’, being an ‘agent’, ‘thinking’ ‘wanting’, ‘rewards’ etc. As I understand it, most AI researchers don’t think that AIs will have human-style qualia, sentience, or consciousness.
But if AI don’t have qualia/sentience, how can they ‘want things’ ‘have goals’ ‘be rewarded’, etc? (since in humans, these things seem to depend on our qualia, and specifically our ability to feel pleasure and pain).
I first realised that I was confused about this when reading Richard Ngo’s introduction to AI safety and he was talking about reward functions and reinforcement learning. I realised that I don’t understand how reinforcement learning works in machines. I understand how it works in humans and other animals—give the animal something pleasant when it does the desired behaviour and/or painful when it does the bad behaviour. But how can you make a machine without qualia “feel” pleasure or pain?
When I talked to some friends about this, I came to the conclusion that this is just a subset of ‘not knowing how computers work’, and it might be addressed by me getting more knowledge about how computers work (on a hardware, or software-communicating-with-hardware, level). But I’m interested in people’s answers here.
I got a Fatebook account thanks to this post!