zeshen

Karma: 403

Feedback welcomed: www.admonymous.co/zeshen

zeshen Apr 4, 2023, 5:04 AM
2 points
1
on: “Carefully Bootstrapped Alignment” is organizationally hard
Everyone in any position of power (which includes engineers who are doing a lot of intellectual heavy-lifting, who could take insights with them to another company), thinks of it as one of their primary jobs to be ready to stop
In some industries, Stop Work Authorities are implemented, where any employee at any level in the organisation has the power to stop a work deemed unsafe at any time. I wonder if something similar in spirit would be feasible to be implemented in top AI labs.

zeshen Mar 12, 2023, 6:58 AM
1 point
0
in reply to: David Johnston’s comment on: The hot mess theory of AI misalignment: More intelligent agents behave less coherently
Without thinking about it too much, this fits my intuitive sense. An amoeba can’t possibly demonstrate a high level of incoherence because it simply can’t do a lot of things, and whatever it does would have to be very much in line with its goal (?) of survival and reproduction.

[Question] Is there a way to sort LW search results by date posted?

zeshenMar 12, 2023, 4:56 AM

5 points

1 comment1 min readLW link

zeshen Feb 12, 2023, 8:31 AM
4 points
0
on: Rationality-related things I don’t know as of 2023
Thanks for this post. I’ve always had the impression that everyone around LW have been familiar with these concepts since they were kids and now know them by heart, while I’ve been struggling with some of these concepts for the longest time. It’s comforting to me that there are long time LWers who don’t necessarily fully understand all of these stuff either.

zeshen Feb 8, 2023, 6:05 AM
1 point
0
on: You Don’t Exist, Duncan
Browsing through the comments section it seems that everyone relates to this pretty well. I do, too. But I’m wondering if this applies mostly to a LW subculture, or is it a Barnum/Forer effect where every neurotypical person would also relate to?

zeshen Feb 6, 2023, 1:42 PM
1 point
0
in reply to: catubc’s comment on: A newcomer’s guide to the technical AI safety field
With regards the Seed AI paradigm, most of the publications seem to have come from MIRI (especially the earlier ones when they were called the Singularity Institute) with many discussions happening both here on LessWrong as well as events like the Singularity Summit. I’d say most of the thinking around this paradigm happened before the era of deep learning. Nate Soares’ post might provide more context.
You’re right that brain-like AI has not had much traction yet, but it seems to me that there is a growing interest in this research area lately (albeit much slower than the Prosaic AI paradigm), and I don’t think they fall squarely under either of the Seed AI paradigm nor the Prosaic AI paradigm. Of course there may be considerable overlap between those ‘paradigms’, but I felt that they were sufficiently distinct to warrant a category of its own even though I may not think of it as a critical concept in AI literature.

zeshen Jan 31, 2023, 11:38 AM
1 point
0
in reply to: David Scott Krueger (formerly: capybaralet)’s comment on: Why I hate the “accident vs. misuse” AI x-risk dichotomy (quick thoughts on “structural risk”)
AI is highly non-analogous with guns.
Yes, especially for consequentialist AIs that don’t behave like tool AIs.

zeshen Jan 31, 2023, 5:12 AM
LW: 2 AF: 2
1
AF
on: Why I hate the “accident vs. misuse” AI x-risk dichotomy (quick thoughts on “structural risk”)
I feel like I broadly agree with most of the points you make, but I also feel like accident vs misuse are still useful concepts to have.
For example, disasters caused by guns could be seen as:
- Accidents, e.g. killing people by mistaking real guns for prop guns, which may be mitigated with better safety protocols
- Misuse, e.g. school shootings, which may be mitigated with better legislations and better security etc.
- Other structural causes (?), e.g. guns used in wars, which may be mitigated with better international relations
Nevertheless, all of the above are complex and structural in different ways where it is often counterproductive or plain misleading to assign blame (or credit) to the causal node directly upstream of it (in this case, guns).
While I agree that the majority of AI risks are neither caused by accidents nor misuse, and that they shouldn’t be seen as a dichotomy, I do feel that the distinction may still be useful in some contexts i.e. what the mitigation approaches could look like.

zeshen Jan 26, 2023, 7:29 AM
2 points
1
on: Recursive Middle Manager Hell
Upvoted. Though as someone who has been in the corporate world for close to a decade, this is probably one of the rare LW posts that I didn’t learn anything new from. And because every point is so absolutely true and extremely common in my experience, when reading the post I was just wondering the whole time how this is even news.

zeshen Jan 18, 2023, 6:39 AM
0 points
0
on: Models Don’t “Get Reward”
There are probably enough comments here already, but thanks again for the post, and thanks to the mods for curating it (I would’ve missed it otherwise).

zeshen Dec 1, 2022, 2:24 AM
18 points
5
on: Be less scared of overconfidence
This is a nice post that echoes many points in Eliezer’s book Inadequate Equilibria. In short, it is entirely possible that you outperform ‘experts’ or ‘the market’, if there are reasons to believe that these systems converge to a sub-optimal equilibrium, and even more so when you have more information that the ‘experts’, like in your Wave vs Theorem example.

zeshen Nov 26, 2022, 10:58 AM
1 point
0
in reply to: TurnTrout’s comment on: Don’t design agents which exploit adversarial inputs
Thanks for the explanation!

zeshen Nov 24, 2022, 8:55 AM
LW: 1 AF: 1
0
AF
in reply to: TurnTrout’s comment on: Don’t design agents which exploit adversarial inputs
In every scenario, if you have a superintelligent actor which is optimizing the grader’s evaluations while searching over a large real-world plan space, the grader gets exploited.
Similar to the evaluator-child who’s trying to win his mom’s approval by being close to the gym teacher, how would grader exploitation be different from specification gaming / reward hacking? In theory, wouldn’t a perfect grader solve the problem?

zeshen Nov 21, 2022, 6:23 AM
2 points
−5
on: A newcomer’s guide to the technical AI safety field
In case anyone comes across this post trying to understand the field, Scott Aaronson did a better job at me at describing the “seed AI” and “prosaic AI” paradigms here, which he calls “Orthodox” vs “Reform”.

zeshen Nov 21, 2022, 4:10 AM
LW: 1 AF: 1
0
AF
on: Don’t design agents which exploit adversarial inputs
I’m probably missing something, but doesn’t this just boil down to “misspecified goals lead to reward hacking”?

zeshen Nov 19, 2022, 10:27 AM
4 points
3
on: Reflective Consequentialism
This post makes sense to me though it feels almost trivial. I’m puzzled by the backlash against consequentialism, it just feels like people are overreacting. Or maybe the ‘backlash’ isn’t actually as strong as I’m reading it to be.
I’d think of virtue ethics as some sort of equilibrium that society has landed ourselves in after all these years of being a species capable of thinking about ethics. It’s not the best but you’d need more than naive utilitarianism to beat it (this EA forum post feels like commonsense to me too), which you describe as reflective consequentialism. It seems like it all boils down to: be a consequentialist, as long as you 1) account for second-order and higher effects, and 2) account for bad calculation due to corrupted hardware.

zeshen Nov 19, 2022, 7:36 AM
1 point
in reply to: LawrenceC’s comment on: 2-D Robustness
Thanks—this helps.

zeshen Nov 17, 2022, 7:08 AM
1 point
in reply to: LawrenceC’s comment on: 2-D Robustness
Thanks for the reply!
But I think you can come up with clean examples of capabilities failures if you look at, say, robots that use search to plan; they often do poorly according to the manually specified reward function on new domains because optimizing the reward is too hard for its search algorithm.
I’d be interested to see actual examples of this, if there are any. But also, how would this not be an objective robustness failure if we frame the objective as “maximize reward”?
if you perform Inverse Optimal Control on the behavior of the robot and derive a revealed reward function, you’ll find that its
Do you mean to say that its reward function will be indistinguishable from its policy?
there doesn’t seem to be a super principled way of dividing up capabilities and preferences in the first place.
Interesting paper, thanks! If a policy cannot be decomposed into a planning algorithm and a reward function anyway, it’s unclear to me why 2D-robustness would be a better framing of robustness than just 1D-robustness.

zeshen Nov 16, 2022, 3:57 AM
1 point
in reply to: Charbel-Raphaël’s comment on: 2-D Robustness
Thanks for the example, but why this is a capabilities robustness problem and not an objective robustness problem, if we think of the objective as ‘classify pandas accurately’?

zeshen Nov 8, 2022, 6:07 PM
2 points
on: Thoughts on AGI safety from the top
I don’t know how I even got here after so long but I really like this post. Looking forward to next year’s post.

zeshen

[Question] Is there a way to sort LW search re­sults by date posted?

[Question] Is there a way to sort LW search results by date posted?