Feedback welcomed: www.admonymous.co/zeshen
zeshen
Without thinking about it too much, this fits my intuitive sense. An amoeba can’t possibly demonstrate a high level of incoherence because it simply can’t do a lot of things, and whatever it does would have to be very much in line with its goal (?) of survival and reproduction.
[Question] Is there a way to sort LW search results by date posted?
Thanks for this post. I’ve always had the impression that everyone around LW have been familiar with these concepts since they were kids and now know them by heart, while I’ve been struggling with some of these concepts for the longest time. It’s comforting to me that there are long time LWers who don’t necessarily fully understand all of these stuff either.
Browsing through the comments section it seems that everyone relates to this pretty well. I do, too. But I’m wondering if this applies mostly to a LW subculture, or is it a Barnum/Forer effect where every neurotypical person would also relate to?
With regards the Seed AI paradigm, most of the publications seem to have come from MIRI (especially the earlier ones when they were called the Singularity Institute) with many discussions happening both here on LessWrong as well as events like the Singularity Summit. I’d say most of the thinking around this paradigm happened before the era of deep learning. Nate Soares’ post might provide more context.
You’re right that brain-like AI has not had much traction yet, but it seems to me that there is a growing interest in this research area lately (albeit much slower than the Prosaic AI paradigm), and I don’t think they fall squarely under either of the Seed AI paradigm nor the Prosaic AI paradigm. Of course there may be considerable overlap between those ‘paradigms’, but I felt that they were sufficiently distinct to warrant a category of its own even though I may not think of it as a critical concept in AI literature.
AI is highly non-analogous with guns.
Yes, especially for consequentialist AIs that don’t behave like tool AIs.
I feel like I broadly agree with most of the points you make, but I also feel like accident vs misuse are still useful concepts to have.
For example, disasters caused by guns could be seen as:
Accidents, e.g. killing people by mistaking real guns for prop guns, which may be mitigated with better safety protocols
Misuse, e.g. school shootings, which may be mitigated with better legislations and better security etc.
Other structural causes (?), e.g. guns used in wars, which may be mitigated with better international relations
Nevertheless, all of the above are complex and structural in different ways where it is often counterproductive or plain misleading to assign blame (or credit) to the causal node directly upstream of it (in this case, guns).
While I agree that the majority of AI risks are neither caused by accidents nor misuse, and that they shouldn’t be seen as a dichotomy, I do feel that the distinction may still be useful in some contexts i.e. what the mitigation approaches could look like.
Upvoted. Though as someone who has been in the corporate world for close to a decade, this is probably one of the rare LW posts that I didn’t learn anything new from. And because every point is so absolutely true and extremely common in my experience, when reading the post I was just wondering the whole time how this is even news.
There are probably enough comments here already, but thanks again for the post, and thanks to the mods for curating it (I would’ve missed it otherwise).
This is a nice post that echoes many points in Eliezer’s book Inadequate Equilibria. In short, it is entirely possible that you outperform ‘experts’ or ‘the market’, if there are reasons to believe that these systems converge to a sub-optimal equilibrium, and even more so when you have more information that the ‘experts’, like in your Wave vs Theorem example.
Thanks for the explanation!
In every scenario, if you have a superintelligent actor which is optimizing the grader’s evaluations while searching over a large real-world plan space, the grader gets exploited.
Similar to the evaluator-child who’s trying to win his mom’s approval by being close to the gym teacher, how would grader exploitation be different from specification gaming / reward hacking? In theory, wouldn’t a perfect grader solve the problem?
In case anyone comes across this post trying to understand the field, Scott Aaronson did a better job at me at describing the “seed AI” and “prosaic AI” paradigms here, which he calls “Orthodox” vs “Reform”.
I’m probably missing something, but doesn’t this just boil down to “misspecified goals lead to reward hacking”?
This post makes sense to me though it feels almost trivial. I’m puzzled by the backlash against consequentialism, it just feels like people are overreacting. Or maybe the ‘backlash’ isn’t actually as strong as I’m reading it to be.
I’d think of virtue ethics as some sort of equilibrium that society has landed ourselves in after all these years of being a species capable of thinking about ethics. It’s not the best but you’d need more than naive utilitarianism to beat it (this EA forum post feels like commonsense to me too), which you describe as reflective consequentialism. It seems like it all boils down to: be a consequentialist, as long as you 1) account for second-order and higher effects, and 2) account for bad calculation due to corrupted hardware.
Thanks—this helps.
Thanks for the reply!
But I think you can come up with clean examples of capabilities failures if you look at, say, robots that use search to plan; they often do poorly according to the manually specified reward function on new domains because optimizing the reward is too hard for its search algorithm.
I’d be interested to see actual examples of this, if there are any. But also, how would this not be an objective robustness failure if we frame the objective as “maximize reward”?
if you perform Inverse Optimal Control on the behavior of the robot and derive a revealed reward function, you’ll find that its
Do you mean to say that its reward function will be indistinguishable from its policy?
Interesting paper, thanks! If a policy cannot be decomposed into a planning algorithm and a reward function anyway, it’s unclear to me why 2D-robustness would be a better framing of robustness than just 1D-robustness.
Thanks for the example, but why this is a capabilities robustness problem and not an objective robustness problem, if we think of the objective as ‘classify pandas accurately’?
I don’t know how I even got here after so long but I really like this post. Looking forward to next year’s post.
In some industries, Stop Work Authorities are implemented, where any employee at any level in the organisation has the power to stop a work deemed unsafe at any time. I wonder if something similar in spirit would be feasible to be implemented in top AI labs.