I wasn’t a fan of this book, but maybe that’s just because I’m not in the target audience. As a first introduction to AI safety I recommend The AI Does Not Hate You by Tom Chivers (facebook.com/casebash/posts/10100403295741091) and for those who are interested in going deeper I’d recommend Superintelligence by Nick Bostrom. The strongest chapter was his assault on arguments against those who think we shouldn’t worry about superintelligence, but you can just read it here: https://spectrum.ieee.org/…/many-experts-say-we-shouldnt-wo…).
I learned barely anything that was new from this book. Even when it came to Russell’s own approach, Co-operative Reinforcement Learning, I felt that the treatment was shallow (I won’t write about this approach until I’ve had a chance to review it directly again). There were a few interesting ideas that I’ll list below, but I was surprised by how little I’d learned by the end. There’s a decent explanation of some very basic concepts within AI, but this was covered in a way that was far too shallow for me to recommend it.
Interesting ideas/quotes: - More processing power won’t solve AI without better algorithms. It simply gets you the wrong answer faster - Language bootstrapping: Comprehension is dependent on knowing facts and extracting facts is dependent on comprehension. You might think that we could bootstrap an AI using easy to comprehend text, but in practise we end up extracting incorrect facts that scrambled further comprehension - We have an advantage with predicting humans as we have a human mind to simulate with; it’ll take longer for AIs to develop this ability - He suggests that we have a right to mental security and that it is naive to trust that the truth will win out. Unfortunately, he doesn’t address any of the unfortunate concerns - By default, a utility maximiser won’t want us to turn it off as that would interfere with its goals. We could reward it when we turn it off, but that could incentivise it to manipulate it to turn us off. Instead, if the utility maximiser is trying to optimise for our reward function and it is uncertain about what it is, then it would let us turn it off - We might decide that we don’t want to satisfy all preferences, for example, we mightn’t feel any obligation to take into account preferences that are sadistic, vindictive or spiteful. But refusing to consider these preferences could unforeseen consequences, what if envy can’t be ignored as a factor without destroying our self-esteem? - It’s hard to tell if an experience has taught someone more about preferences or changed their preferences (at least without looking into their brain. In either case the response is the same. - We want robots to avoid interpreting commands too literally, as opposed to information about human preferences. For example, if I ask a robot to fetch a cup of coffee, I assume that the nearest outlet isn’t the next city over nor that it will cost $100. We don’t want the robot to fetch it at all costs.
Review: Human-Compatible by Stuart Russell
I wasn’t a fan of this book, but maybe that’s just because I’m not in the target audience. As a first introduction to AI safety I recommend The AI Does Not Hate You by Tom Chivers (facebook.com/casebash/posts/10100403295741091) and for those who are interested in going deeper I’d recommend Superintelligence by Nick Bostrom. The strongest chapter was his assault on arguments against those who think we shouldn’t worry about superintelligence, but you can just read it here: https://spectrum.ieee.org/…/many-experts-say-we-shouldnt-wo…).
I learned barely anything that was new from this book. Even when it came to Russell’s own approach, Co-operative Reinforcement Learning, I felt that the treatment was shallow (I won’t write about this approach until I’ve had a chance to review it directly again). There were a few interesting ideas that I’ll list below, but I was surprised by how little I’d learned by the end. There’s a decent explanation of some very basic concepts within AI, but this was covered in a way that was far too shallow for me to recommend it.
Interesting ideas/quotes:
- More processing power won’t solve AI without better algorithms. It simply gets you the wrong answer faster
- Language bootstrapping: Comprehension is dependent on knowing facts and extracting facts is dependent on comprehension. You might think that we could bootstrap an AI using easy to comprehend text, but in practise we end up extracting incorrect facts that scrambled further comprehension
- We have an advantage with predicting humans as we have a human mind to simulate with; it’ll take longer for AIs to develop this ability
- He suggests that we have a right to mental security and that it is naive to trust that the truth will win out. Unfortunately, he doesn’t address any of the unfortunate concerns
- By default, a utility maximiser won’t want us to turn it off as that would interfere with its goals. We could reward it when we turn it off, but that could incentivise it to manipulate it to turn us off. Instead, if the utility maximiser is trying to optimise for our reward function and it is uncertain about what it is, then it would let us turn it off
- We might decide that we don’t want to satisfy all preferences, for example, we mightn’t feel any obligation to take into account preferences that are sadistic, vindictive or spiteful. But refusing to consider these preferences could unforeseen consequences, what if envy can’t be ignored as a factor without destroying our self-esteem?
- It’s hard to tell if an experience has taught someone more about preferences or changed their preferences (at least without looking into their brain. In either case the response is the same.
- We want robots to avoid interpreting commands too literally, as opposed to information about human preferences. For example, if I ask a robot to fetch a cup of coffee, I assume that the nearest outlet isn’t the next city over nor that it will cost $100. We don’t want the robot to fetch it at all costs.