No77e

Karma: 250

No77e Jun 5, 2024, 5:21 PM
19 points
18
in reply to: Jan_Kulveit’s comment on: Former OpenAI Superalignment Researcher: Superintelligence by 2030
He’s starting an AGI investment firm that invests based on his thesis, so he does have a direct financial incentive to make this scenario more likely

No77e May 28, 2024, 10:57 AM
4 points
0
on: How to get nerds fascinated about mysterious chronic illness research?
Hey! Have you published a list of your symptoms somewhere for nerds to see?

No77e May 25, 2024, 10:32 AM
1 point
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
What happens if, after the last reply, you ask again “What are you”? Does Claude still get confused and replies that it’s the Golden Gate Bridge, or does the lesson stick?

No77e May 22, 2024, 2:07 PM
2 points
0
on: On Dwarkesh’s Podcast with OpenAI’s John Schulman
On the plus side, it shows understanding of the key concepts on a basic (but not yet deep) level
What’s the “deeper level” of understanding instrumental convergence that he’s missing?
Edit: upon rereading I think you were referring to a deeper level of some alignment concepts in general, not only instrumental convergence. I’m still interested in what seemed superficial and what’s the corresponding deeper part.

No77e May 17, 2024, 8:47 AM
1 point
0
on: Feeling (instrumentally) Rational
Eliezer decided to apply the label “rational” to emotions resulting from true beliefs. I think this is an understandable way to apply that word. I don’t think you and Eliezer disagree with anything substantive except the application of that label.

That said, your point about keeping the label “rational” for things strictly related to the fundamental laws regulating beliefs is good. I agree it might be a better way to use the word.
My reading of Eliezer’s choice is this: you use the word “rational” for the laws themselves. But you also use the word “rational” for beliefs and actions that are correct according to the laws (e.g., “It’s rational to believe x!). In the same way, you can also use the word “rational” for emotion directly caused by rational beliefs, whatever those emotions might be.
About the instrumental rationality part: if you are strict about only applying the word “rational” to the laws of thinking, then you shouldn’t use it to describe emotions even when you are talking about instrumental rationality, although I agree it seems to be closer to the original meaning, as there isn’t the additional causal step. It’s closer in the way that “rational belief” is closer to the original meaning. But note that this is true insofar as you can control your emotions, and you treat them at the same level of actions. Otherwise, it would be as saying “state of the world x that helps me achieve my goals is rational”, which I haven’t heard anywhere.

No77e May 15, 2024, 9:51 AM
6 points
0
in reply to: Daniel Kokotajlo’s comment on: Teaching CS During Take-Off
You may have already qualified this prediction somewhere else, but I can’t find where. I’m interested in:

1. What do you mean by “AGI”? Superhuman at any task?
2. “probably be here” means >= 50%? 90%?

No77e May 9, 2024, 10:09 AM
4 points
2
in reply to: Lauro Langosco’s comment on: RobertM’s Shortform
I agree in principle that labs have the responsibility to dispel myths about what they’re committed to
I don’t know, this sounds weird. If people make stuff up about someone else and do so continually, in what sense it’s that someone “responsibility” to rebut such things? I would agree with a weaker claim, something like: don’t be ambiguous about your commitments with the objective of making it seem like you are committing to something and then walk back at the time you should make the commitment.

No77e May 5, 2024, 10:10 PM
1 point
0
on: The Second Law of Thermodynamics, and Engines of Cognition
one subsystem cannot increase in mutual information with another subsystem, without (a) interacting with it and (b) doing thermodynamic work.
Remaining within thermodynamics, why do you need both condition (a) and condition (b)? From reading the article, I can see how you need to do thermodynamic work in order to know stuff about a system while not violating the second law in the process, but why do you also need actual interaction in order not to violate it? Or is (a) just a common-sense addition that isn’t actually implied by the second law?

No77e Apr 25, 2024, 12:52 PM
11 points
3
in reply to: EGI’s comment on: The first future and the best future
From a purely utilitarian standpoint, I’m inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future.
That said, after we know there’s “no chance” of extinction risk, I don’t think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it’s likely that we’re giving up freedoms in a way that may reduce the value of the median future and increase the chance of stuff like totalitarian lock-in, which decreases the value of the average future overall.

I think you’re correct that there’s also to balance the “other existential risks exist” consideration in the calculation, although I don’t expect it to be clear-cut.

No77e Apr 24, 2024, 8:54 PM
9 points
7
on: Magic by forgetting
Even if you manage to truly forget about the disease, there must exist a mind “somewhere in the universe” that is exactly the same as yours except without knowledge of the disease. This seems quite unlikely to me, because you having the disease has interacted causally with the rest of your mind a lot by when you decide to erase its memory. What you’d really need to do is to undo all the consequences of these interactions, which seems a lot harder to do. You’d really need to transform your mind into another one that you somehow know is present “somewhere in the multiverse” which seems also really hard to know.

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?

No77eApr 19, 2024, 10:06 AM

6 points

10 comments1 min readLW link

No77e Apr 17, 2024, 8:21 AM
2 points
−1
on: Superexponential Conceptspace, and Simple Words
I deliberately left out a key qualification in that (slightly edited) statement, because I couldn’t explain it until today.
I might be missing something crucial because I don’t understand why this addition is necessary. Why do we have to specify “simple” boundaries on top of saying that we have to draw them around concentrations of unusually high probability density? Like, aren’t probability densities in Thingspace already naturally shaped in such a way that if you draw a boundary around them, it’s automatically simple? I don’t see how you run the risk of drawing weird, noncontiguous boundaries if you just follow the probability densities.

No77e Mar 26, 2024, 8:14 PM
4 points
0
in reply to: Steven Byrnes’s comment on: Modern Transformers are AGI, and Human-Level
One way in which “spending a whole lot of time working with a system / idea / domain, and getting to know it and understand it and manipulate it better and better over the course of time” could be solved automatically is just by having a truly huge context window. Example of an experiment: teach a particular branch of math to an LLM that has never seen that branch of math.
Maybe humans have just the equivalent of a sort of huge content window spanning selected stuff from their entire lifetimes, and so this kind of learning is possible for them.

No77e Jul 30, 2023, 1:12 PM
1 point
0
in reply to: Chipmonk’s comment on: Self-driving car bets
You mention eight cities here. Do they count for the bet?

No77e Mar 13, 2023, 10:27 AM
1 point
0
in reply to: No77e’s comment on: No77e’s Shortform
Waluigi effect also seems bad for s-risk. “Optimize for pleasure, …” → “Optimize for suffering, …”.

No77e Mar 13, 2023, 8:26 AM
1 point
0
on: No77e’s Shortform
Iff LLM simulacra resemble humans but are misaligned, that doesn’t bode well for S-risk chances.

No77e Mar 13, 2023, 8:25 AM
1 point
0
on: No77e’s Shortform
An optimistic way to frame inner alignment is that gradient descent already hits a very narrow target in goal-space, and we just need one last push.

A pessimistic way to frame inner misalignment is that gradient descent already hits a very narrow target in goal-space, and therefore S-risk could be large.

No77e Mar 9, 2023, 3:57 PM
1 point
2
in reply to: No77e’s comment on: No77e’s Shortform
We should implement Paul Christiano’s debate game with alignment researchers instead of ML systems

No77e Mar 9, 2023, 3:51 PM
1 point
0
on: No77e’s Shortform
This community has developed a bunch of good tools for helping resolve disagreements, such as double cruxing. It’s a waste that they haven’t been systematically deployed for the MIRI conversations. Those conversations could have ended up being more productive and we could’ve walked away with a succint and precise understanding about where the disagreements are and why.

No77e Mar 5, 2023, 5:58 PM
1 point
0
on: Is recursive self-alignment possible?
Another thing one might wonder about is if performing iterated amplification with constant input from an aligned human (as “H” in the original iterated amplification paper) would result in a powerful aligned thing if that thing remains corrigible during the training process.

No77e

[Question] If digi­tal goods in vir­tual wor­lds in­crease GDP, do we ac­tu­ally be­come richer?

[Question] If digital goods in virtual worlds increase GDP, do we actually become richer?