This account exists only for archival purposes.
[deactivated]
This doesn’t apply to more central cases like “gay” and “Black”.
Fair! I should have said 1,000 years to make the point more clear-cut.
It would be much more helpful if Scott used a real example rather than a fictional one. I don’t think his fictional example is very realistic.
Scott Alexander is wrong about slurs
Thanks!
Thanks for posting this. I am still a bit fuzzy on what exactly the Superalignment plan is, or if there even is a firm plan at this stage. Hope we can learn more soon.
I think that me not wearing shoes at university is evidence that I might also disdain sports, but not evidence that I might steal.
it is not actually the case that violating one specific social norm for specific reason is a substantial update that someone is a Breaking Social Boundaries Type Pokemon in general.
If I can attempt to synthesize these two points into a single point: don’t assume weird people are evil.
If someone walks around barefoot in an urban environment, that’s a good clue they might also be weird in other ways. But weird ≠ evil.
Principled non-conformity is a thing. Human diversity is a thing. Eccentricity is a thing.
If weirdness indicated evil, then LessWrong would be a hive of scum and villainy.
Uncritically enforcing rules and conformity to an idea of normalcy is not good. It has done great harm.
Longtermism question: has anyone ever proposed a discount rate on the moral value of future lives? By analogy to discount rates used in finance and investing.
This could account for the uncertainty in predicting the existence of future people. Or serve as a compromise between views like neartermism and longtermism, or pro-natalism and anti-natalism.
Now he’s free to run for governor of California in 2026:
I was thinking about it because I think the state is in a very bad place, particularly when it comes to the cost of living and specifically the cost of housing. And if that doesn’t get fixed, I think the state is going to devolve into a very unpleasant place. Like one thing that I have really come to believe is that you cannot have social justice without economic justice, and economic justice in California feels unattainable. And I think it would take someone with no loyalties to sort of very powerful interest groups. I would not be indebted to other groups, and so maybe I could try a couple of variable things, just on this issue.
...
I don’t think I’d have enough experience to do it, because maybe I could do like a few things that would be really good, but I wouldn’t know how to deal with the thousands of things that also just needed to happen.
And more importantly than that to me personally, I wanted to spend my time trying to make sure we get artificial intelligence built in a really good way, which I think is like, to me personally, the most important problem in the world and not something I was willing to set aside to run for office.
Prediction market: https://manifold.markets/firstuserhere/will-sam-altman-run-for-the-governo
William Nordhaus estimates that firms recover maybe 2% of the value they create by developing new technologies.
Isn’t this the wrong metric? 2% of the value of a new technology might be a lot of money, far in excess of the R&D cost required to create it.
I think you are way overestimating your ability to tell who is trans and way underestimating the ability of trans people to pass as cis. Sometimes, you just can’t tell.
What on Earth? Why does it require being “devious” to be in the closet? If you were given a choice between lifelong celibacy and loneliness, on the one hand, or, on the other hand, seriously endangering yourself, risking being imprisoned or institutionalized, and ruining your life (economically and socially) by having relationships and disclosing them, would it make you “devious” to choose a third option and keep your relationships secret?
Were Jews who hid from the Nazis “devious”? Were people who helped them hide “devious”? Only in a sense that drains the word “devious” of its negative moral connotation.
The documentary “Before Stonewall” covers what gay life was like in the 40s, 50s, and 60s. I would recommend it.
The phrase “change sex” projects an anti-trans aura. (Not as much as using a slur, but it still makes me wince.) People say “transition” these days.
Another factor worth considering is that many trans people “pass” as cis, so you wouldn’t necessarily know someone is trans just by looking at them.
Does your town have a local PFLAG chapter? Another LGBT organization? If so, there might be trans people involved there.
He said:
At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models
What do you think he meant by “AlphaGo-type systems”? I could be wrong, but I interpreted that as a reference to RL.
This seems super important to the argument! Do you know if it’s been discussed in detail anywhere else?
We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?
Yes, because if the first superhuman AGI is aligned, and if it performs a pivotal act to prevent misaligned AGI from being created, then we will avert existential catastrophe.
If there is a 99.99% chance of that happening, then we should be quite sanguine about AI x-risk. On the other hand, if there is only a 0.01% chance, then we should be very worried.
I have a question about “AGI Ruin: A List of Lethalities”.
These two sentences from Section B.2 stuck out to me as the most important in the post:
...outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction.
...on the current optimization paradigm there is no general idea of how to get particular inner properties into a system, or verify that they’re there, rather than just observable outer ones you can run a loss function over.
My question is: supposing this is all true, what is the probability of failure of inner alignment? Is it 0.01%, 99.99%, 50%...? And how do we know how likely failure is?
It seems like there is a gulf between “it’s not guaranteed to work” and “it’s almost certain to fail”.
I don’t know if anyone still reads comments on this post from over a year ago. Here goes nothing.
I am trying to understand the argument(s) as deeply and faithfully as I can. These two sentences from Section B.2 stuck out to me as the most important in the post (from the point of view of my understanding):
...outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction.
...on the current optimization paradigm there is no general idea of how to get particular inner properties into a system, or verify that they’re there, rather than just observable outer ones you can run a loss function over.
My first question is: supposing this is all true, what is the probability of failure of inner alignment? Is it 0.01%, 99.99%, 50%...? And how do we know how likely failure is?
It seems like there is a gulf between “it’s not guaranteed to work” and “it’s almost certain to fail”.- Nov 15, 2023, 6:58 PM; 7 points) 's comment on [deactivated]’s Shortform by (
There is a strong argument that the term is bad and misleading. I will concede that.
Well, my examples are both real and non-fringe, whereas “Asian” and “field work” are fictional and fringe, respectively. So, I think “gay” and “Black” are more central examples.
Scott also seems annoyed by “Black”, but doesn’t explain why he’s (seemingly) annoyed.
There’s a bit more here than I can readily respond to right now, but let me know if you think I’ve avoided the crux of the matter and you’d like me to address it in a future comment.