“I’ll launch you iff you would in fact turn half the universe into diamonds”
I don’t get why you think this part is hard. Even if you’re dealing with domain-specifically superintelligent paperclipper that’s somehow too weird to understand human concepts and never trained by absorbing the internet, it could commit to learn them later on. (I thought the hard part was getting AI to care about what to value, not understand what humans value. ) I agree that “do what’s good” is under-defined and therefore not ideal for trading, but even there you could have the AI commit to some good-faith attempt that still gets you some of the value.
Edit: Ah okay, reading on:
Hopefully it’s not that bad, but for this trade to have any value to you (and thus be worth making), the AI itself needs to have a concept for the thing you want built, and you need to be able to examine the AI’s mind and confirm that this exactly-correct concept occurs in its mental precommitment in the requisite way.
I assumed that we had taken strong mind-reading abilities for granted in the example.
I seems like there’s still some crux I’m potentially missing related to what the AI’s native language of thought is. My intuition was that if an AI speaks English and you have mind-reading technology, that should be enough to confirm its intention to honor the deal after it told you that it would (because you can just ask if it was lying). But, thinking of it, “ask if it was lying” might be quite complicated if you go through the steps. (FWIW, I don’t think “strong mind-reading technology” is a very plausible assumption so I’m definitely with you in terms of practicality; I just felt quite confused about superintelligent AIs not understanding what diamonds are.)
Making a thread because it seems related to the above:
Then later, it is smart enough to reflect back on that data and ask: “Were the humans pointing me towards the distinction between goodness and badness, with their training data? Or were they pointing me towards the distinction between that-which-they’d-label-goodness and that-which-they’d-label-badness, with things that look deceptively good (but are actually bad) falling into the former bin?” And to test this hypothesis, it would go back to its training data and find some example bad-but-deceptively-good-looking cases, and see that they were labeled “good”, and roll with that.
Or at least, that’s the sort of thing that happens by default.
I feel like some dynamic similar to this goes on all the time with how people use language, and things work out fine. And deep learning shows that AIs can learn common sense.
I’m reminded of this discussion where I shared the skepticism at the faces example (but also thought it’s possible that I might be missing something).
I don’t get why you think this part is hard. Even if you’re dealing with domain-specifically superintelligent paperclipper that’s somehow too weird to understand human concepts and never trained by absorbing the internet, it could commit to learn them later on. (I thought the hard part was getting AI to care about what to value, not understand what humans value. ) I agree that “do what’s good” is under-defined and therefore not ideal for trading, but even there you could have the AI commit to some good-faith attempt that still gets you some of the value.
Edit: Ah okay, reading on:
I assumed that we had taken strong mind-reading abilities for granted in the example.
I seems like there’s still some crux I’m potentially missing related to what the AI’s native language of thought is. My intuition was that if an AI speaks English and you have mind-reading technology, that should be enough to confirm its intention to honor the deal after it told you that it would (because you can just ask if it was lying). But, thinking of it, “ask if it was lying” might be quite complicated if you go through the steps. (FWIW, I don’t think “strong mind-reading technology” is a very plausible assumption so I’m definitely with you in terms of practicality; I just felt quite confused about superintelligent AIs not understanding what diamonds are.)
Making a thread because it seems related to the above:
I feel like some dynamic similar to this goes on all the time with how people use language, and things work out fine. And deep learning shows that AIs can learn common sense.
I’m reminded of this discussion where I shared the skepticism at the faces example (but also thought it’s possible that I might be missing something).