AIS student, self-proclaimed aspiring rationalist, very fond of game theory.
”The only good description is a self-referential description, just like this one.”
momom2
I think you miss one important existential risk separate from extinction, which is having a lastingly suboptimal society. Like, systematic institutional inefficiency, and being unable to change anything because of disempowerment.
In that scenario, maybe humanity is still around because one of the things we can measure and optimize for is making sure a minimum amount of humans are alive, but the living conditions are undesirable.
I’m not sure either, but here’s my current model:
Even though it looks pretty likely that AISC is an improvement on no-AISC, there are very few potential funders:
1) EA-adjacent caritative organizations.
2) People from AIS/rat communities.Now, how to explain their decisions?
For the former, my guess would be a mix of not having heard of/received an application from AISC and preferring to optimize heavily towards top-rated charities. AISC’s work is hard to quantify, as you can tell from the most upvoted comments, and that’s a problem when you’re looking for projects to invest because you need to avoid being criticized for that kind of choice if it turns out AISC is crackpotist/a waste of funds. The Copenhagen interpretation of ethics applies hard there for an opponent with a tooth against the organization.
For the latter, it depends a lot on individual people, but here are the possibilities that come to mind:
- Not wanting donate anything but feeling like having to, which leads to large donations to few projects when you feel like donating enough to break the status quo bias.
- Being especially mindful of one’s finances and donating only to preferred charities, because of a personal attachment (again, not likely to pick AISC a priori) or because they’re provably effective.
To answer 2), you can say why you don’t donate to AISC? Your motivations are probably very similar to other potential donators here.
Follow this link to find it. The translation is made by me, and open to comments. Don’t hesitate to suggest improvements.
It’s not obvious at all to me, but it’s certainly a plausible theory worth testing!
To whom it may concern, here’s a translation of “Bold Orion” in French.
A lot of the argumentation in this post is plausible, but also, like, not very compelling?
Mostly the “frictionless” model of sexual/gender norms, and the examples associated: I can see why these situations are plausible (if at least because they’re very present in my local culture) but I wouldn’t be surprised if they are a bunch of social myth either, in which case the whole post is invalidated.I appreciate the effort though; it’s food for thought even if it doesn’t tell me much about how to update based on the conclusion.
Epistemic status: Had a couple conversations on AI Plans with the founder, participated in the previous critique-a-thon. I’ve helped AI Plans a bit before, so I’m probably biased towards optimism.
Neglectedness: Very neglected. AI Plans wants to become a database of alignment plans which would allow quick evaluation of whether an approach is worth spending effort on, at least as a quick sanity check for outsiders. I can’t believe it didn’t exist before! Still very rough and unuseable for that purpose for now, but that’s what the critique-a-thon is for: hopefully, as critiques accumulate and more votes are fed into the system, it will become more useful.
Tractability: High. It may be hard to make winning critiques, but considering the current state of AI Plans, it’s very easy to make an improvement. If anything, you can filter out the obvious failures.
Impact: I’m not as confident here. If AI Plans works as intended, it could be very valuable to allocate funds more efficiently and save time by figuring out which approaches should be discarded. However, it’s possible that it will just fail to gain steam and become a stillborn project. I’ve followed it for a couple months, and I’ve been positively surprised several times, so I’m pretty optimistic.
The bar to entry is pretty low; if you’ve been following AIS blogs or forums for several months, you probably have something to contribute. It’s very unlikely you’ll have a negative impact.
It may also be an opportunity for you to discuss with AIS-minded people and check your opinions on a practical problem; if you feel like an armchair safetyist and tired to be one, this is the occasion to level up.
Another way to think about it is that the engagement was very low in previous critique-a-thon so if you have a few hours to spare, you can make some easy money and fuzzies even if you’re not sure about the value in utilons.
Thank you, this is incredibly interesting! Did you ever write up more on the subject? I’m excited to see how it relates to mesa-optimisation in particular.
In the finite case, where , then
Typo: I think you mean ?
I’m surprised to hear they’re posting updates about CoEm.
At a conference held by Connor Leahy, I said that I thought it was very unlikely to work, and asked why they were interested in this research area, and he answered that they were not seriously invested in it.
We didn’t develop the topic and it was several months ago, so it’s possible that 1- I misremember or 2- they changed their minds 3- I appeared adversarial and he didn’t feel like debating CoEm. (For example, maybe he actually said that CoEm didn’t look promising and this changed recently?)
Still, anecdotal evidence is better than nothing, and I look forward to seeing OliviaJ compile a document to shed some light on it.
Nice! Is this on ai-plans already?
I invite you. You can send me this summary in private to avoid downvotes.
There’s a whole part of the argument which is missing which is the framing of this as being about AI risk.
I’ve seen various propositions for why this happened, and the board being worried about AI risk is one of them but not the most plausible afaict.
In addition this is phrased similarly to technical problems like the corrigibility, which it is very much not about.
People who say “why can’t you just turn it off” typically refer to literally turning off the AI if it appears to be dangerous, which this is not about. This is about turning off the AI company, not the AI.
1- I didn’t know Executive Order could be repealed easily. Could you please develop?
2- Why is it good news? To me, this looks like a clear improvement on the previous status of regulations.
AlexNet dates back to 2012, I don’t think previous work on AI can be compared to modern statistical AI.
Paul Christiano’s foundational paper on RLHF dates back to 2017.
Arguably, all of agent foundations work turned out to be useless so far, so prosaic alignment work may be what Roko is taking as the beginning of AIS as a field.
The AI safety leaders currently see slow takeoff as humans gaining capabilities, and this is true; and also already happening, depending on your definition. But they are missing the mathematically provable fact that information processing capabilities of AI are heavily stacked towards a novel paradigm of powerful psychology research, which by default is dramatically widening the attack surface of the human mind.
I assume you do not have a mathematical proof of that, or you’d have mentioned it. What makes you think it is mathematically provable?
I would be very interested in reading more about the avenues of research dedicated to showing how AI can be used for psychological attacks from the perspective of AIS (I’d expect such research to be private by default due to infohazards).
I don’t understand how the parts fit together. For example, what’s the point of presenting the (t-,n)-AGI framework or the Four Background Claims?
I assume it’s incomplete. It doesn’t present the other 3 anchors mentioned, nor forecasting studies.
To avoid being negatively influenced by perverse incentives to make societally risky plays, couldn’t TurnTrout just leave the handling of his finances to someone else and be unaware of whether or not he has Google stock?
Doesn’t matter if he does, as long as he doesn’t think he does; and if he’s uncertain about it, I think psychologically it’ll already greatly reduce caring about Google stock.
Not before reading the link, but Elizabeth did state that they expected the pro-meat section to be terrible without reading it, presumably because of the first part.
Since the article is low-quality in the part they read and expected low-quality in the part they didn’t, they shouldn’t take it as evidence of anything at all; that is why I think it’s probably confirmation bias to take it as evidence against excess meat being related to health issues.
Reason for retraction: In hindsight, I think my tone was unjustifiably harsh and incendiary. Also the karma tells that whatever I wrote probably wasn’t that interesting.
Not everything suboptimal, but suboptimal in a way that causes suffering on an astronomical scale (e.g. galactic dystopia, or dystopia that lasts for thousands of years, or dystopia with an extreme number of moral patients (e.g. uploads)).
I’m not sure what you mean by Ord, but I think it’s reasonable to have a significant probability of S-risk from a Christiano-like failure.