Jack R

Karma: 301

Jack R Jul 9, 2022, 11:07 PM
4 points
1
in reply to: johnswentworth’s comment on: Where I agree and disagree with Eliezer
It’s not obvious to me that the class of counter-examples “expertise, in most fields, is not easier to verify than to generate” are actually counter-examples. For example for “if you’re not a hacker, you can’t tell who the good hackers are,” it still seems like it would be easier to verify whether a particular hack will work than to come up with it yourself, starting off without any hacking expertise.

Jack R Jul 8, 2022, 4:42 AM
LW: 6 AF: 4
0
AF
on: Human values & biases are inaccessible to the genome
Could you clarify a bit more what you mean when you say “X is inaccessible to the human genome?”

Jack R May 22, 2022, 1:00 AM
3 points
in reply to: Vivek Hebbar’s comment on: Information Loss --> Basin flatness
Ah okay—I have updated positively in terms of the usefulness based on that description, and have updated positively on the hypothesis “I am missing a lot of important information that contextualizes this project,” though still confused.

Would be interested to know the causal chain from understanding circuit simplicity to the future being better, but maybe I should just stay posted (or maybe there is a different post I should read that you can link me to; or maybe the impact is diffuse and talking about any particular path doesn’t make that much sense [though even in this case my guess is that it is still helpful to have at least one possible impact story]).

Also, just want to make clear that I made my original comment because I figured sharing my user-experience would be helpful (e.g. via causing a sentence about the ToC), and hopefully not with the effect of being discouraging / being a downer.

Jack R May 21, 2022, 11:38 PM
4 points
on: Information Loss --> Basin flatness
I didn’t finish reading this, but if it were the case that:
- There were clear and important implications of this result for making the world better (via aligning AGI)
- These implications were stated in the summary at the beginning
then I very plausibly would have finished reading the post or saved it for later.

ETA: For what it’s worth, I still upvoted and liked the post, since I think deconfusing ourselves about stuff like this is plausibly very good and at the very least interesting. I just didn’t like it enough to finish reading it or save it, because from my perspective it’s expected usefulness wasn’t high enough given the information I had.

Jack R May 16, 2022, 1:43 AM
3 points
on: Is AI Progress Impossible To Predict?
I wonder if there are any measurable dimensions along which tasks can vary, and whether that could help with predicting task progress at all. A simple example is the average input size for the benchmark.

Jack R May 6, 2022, 2:06 AM
1 point
on: Starting too many projects, finishing none
I’m glad you posted this — this may be happening to me and now I’ve read about sunken cost faith counterfactually

Jack R Apr 28, 2022, 8:16 AM
9 points
on: [Advertisement] Want to hire me?
I don’t know how good of a fit you would be, but have you considered applying to Redwood Research?

Jack R Apr 23, 2022, 8:48 AM
1 point
in reply to: Thomas Kwa’s comment on: Mesa-utility functions might not be purely consequentialist
Ah I see, and just to make sure I’m not going crazy, you’ve edited the post now to reflect this?

Jack R Apr 23, 2022, 5:24 AM
1 point
on: Mesa-utility functions might not be purely consequentialist
W is a function, right? If so, what’s its type signature?

Jack R Apr 21, 2022, 11:25 PM
1 point
in reply to: Richard_Kennaway’s comment on: When is positive self-talk not worth it due to self-delusion?
I agree, though I want to be able to have a good enough understanding of the gears such that I can determine whether something like “telling yourself you are awesome everyday” will have counterfactual better outcomes than not. I guess the studies seem to suggest the answer in this case is “yes” in as much as self-delusion negative externalities are captured by the metrics that the studies in the TED talk use. [ETA: and I feel like now I have nearly answered the question for myself, so thanks for the prompt!]

Jack R Apr 21, 2022, 10:07 AM
2 points
in reply to: romeostevensit’s comment on: When is positive self-talk not worth it due to self-delusion?
What’s a motivation stack? Could you give an example?

Jack R Apr 21, 2022, 3:12 AM
1 point
on: When is positive self-talk not worth it due to self-delusion?
A partial answer:
- Your emotions are more negative than granted if, for instance, it’s often the case that your anxiety is strong enough that it feels like you might die and you don’t in fact die.
- Your emotions are more positive than granted if it’s often the case that, for instance, you are excited about getting job offers “more than” you tend to get job offers.
These answers still have ambiguity though, in “more than” and in how many Bayes points your anxiety as a predictor of death actually gets.

[Question] When is positive self-talk not worth it due to self-delusion?

Jack RApr 21, 2022, 3:11 AM

13 points

10 comments1 min readLW link

Jack R Apr 14, 2022, 11:25 PM
2 points
in reply to: Thomas Kwa’s comment on: Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe
I’ll add that when I asked John Wentworth why he was IDA-bearish, he mentioned the inefficiency of bureaucracies and told me to read the following post to learn why interfaces and coordination are hard: Interfaces as a Scarce Resource.

Jack R Apr 14, 2022, 8:20 AM
3 points
on: Takeoff speeds have a huge effect on what it means to work on AI x-risk
while in the slow takeoff world your choices about research projects are closely related to your sociological predictions about what things will be obvious to whom when.

Example?

Jack R Apr 13, 2022, 10:26 PM
4 points
in reply to: johnswentworth’s comment on: [Link] A minimal viable product for alignment
I found this comment pretty convincing. Alignment has been compared to philosophy, which seems at the opposite end of “the fuzziness spectrum” as math and physics. And it does seem like concept fuzziness would make evaluation harder.

I’ll note though that ARC’s approach to alignment seems more math-problem-flavored than yours, which might be a source of disagreement between you two (since maybe you conceptualize what it means to work on alignment differently).

Jack R Apr 13, 2022, 9:24 AM
8 points
in reply to: CarlShulman’s comment on: Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe
MIRI doesn’t have good reasons to support the claim of almost certain doom

I recently asked Eliezer why he didn’t suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was “wrongly” excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons—I haven’t tried to figure that out yet.
Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.

Jack R Apr 13, 2022, 1:29 AM
1 point
on: A broad basin of attraction around human values?
I think Nate Soares has beliefs about question 1. A few weeks ago, we were discussing a question that seems analogous to me—“does moral deliberation converge, for different ways of doing moral deliberation? E.g. is there a unique human CEV?”—and he said he believes the answer is “yes.” I didn’t get the chance to ask him why, though.

Thinking about it myself for a few minutes, it does feel like all of your examples for how the overseer could have distorted values have a true “wrongness” about them that can be verified against reality—this makes me feel optimistic that there is a basin of human values, and that “interacting with reality” broadly construed is what draws you in.

Jack R Apr 11, 2022, 1:53 AM
1 point
in reply to: Shmi’s comment on: Worse than an unaligned AGI
An example is an AI making the world as awful as possible, e.g. by creating dolorium. There is a separate question about how likely this is, hopefully very unlikely.

Jack R Apr 10, 2022, 11:11 PM
1 point
in reply to: Kaj_Sotala’s comment on: What an actually pessimistic containment strategy looks like
I mean to argue against your meta-strategy which relies on obtaining relevant understanding about deception or alignment as we get larger models and see how they work. I agree that we will obtain some understanding, but it seems like we shouldn’t expect that understanding to be very close to sufficient for making AI go well (see my previous argument), and hence not a very promising meta-strategy.

Jack R

[Question] When is pos­i­tive self-talk not worth it due to self-delu­sion?

[Question] When is positive self-talk not worth it due to self-delusion?