Naively extrapolating this trend gets you to 50% reliability of 256-hour tasks in 4 years, which is a lot but not years-long reliability (like humans). So, I must be missing something. Is it that you expect most remote jobs not to require more autonomy than that?
No77e
I tried hedging against this the first time, though maybe that was in a too-inflammatory manner. The second time
Sorry for not replying in more detail, but in the meantime it’d be quite interesting to know whether the authors of these posts confirm that at least some parts of them are copy-pasted from LLM output. I don’t want to call them out (and I wouldn’t have much against it), but I feel like knowing it would be pretty important for this discussion. @Alexander Gietelink Oldenziel, @Nicholas Andresen you’ve written the posts linked in the quote. What do you say?
(not sure whether the authors are going to get a notification with the tag, but I guess trying doesn’t hurt)
You seem overconfident to me. Some things that kinda raised epistemic red flags from both comments above:
I don’t think you’re adding any value to me if you include even a single paragraph of copy-and-pasted Sonnet 3.7 or GPT 4o content
It’s really hard to believe this and seems like a bad exaggeration. Both models sometimes output good things, and someone who copy-pastes their paragraphs on LW could have gone through a bunch of rounds of selection. You might already have read and liked a bunch of LLM-generated content, but you only recognize it if you don’t like it!
The last 2 posts I read contained what I’m ~95% sure is LLM writing, and both times I felt betrayed, annoyed, and desirous to skip ahead.
Unfortunately, there are people who have a similar kind of washed-out writing style, and if I don’t see the posts, it’s hard for me to just trust your judgment here. Was the info content good or not? If it wasn’t, why were you “desirous of skipping ahead” and not just stopping to read? Like, it seems like you still wanted to read the posts for some reason, but if that’s the case then you were getting some value from LLM-generated content, no?
“this is fascinating because it not only sheds light onto the profound metamorphosis of X, but also hints at a deeper truth”
This is almost the most obvious ChatGPT-ese possible. Is this the kind of thing you’re talking about? There’s plenty of LLM-generated text that just doesn’t sound like that and maybe you just dislike a subset of LLM-generated content that sounds like that.
I’m curious about what people disagree with regarding this comment. Also, I guess since people upvoted and agreed with the first one, they do have two groups in mind, but they’re not quite the same as the ones I was thinking about (which is interesting and mildly funny!). So, what was your slicing up of the alignment research x LW scene that’s consistent with my first comment but different from my description in the second comment?
I think it’s probably more of a spectrum than two distinct groups, and I tried to pick two extremes. On one end, there are the empirical alignment people, like Anthropic and Redwood; on the other, pure conceptual researchers and the LLM whisperers like Janus, and there are shades in between, like MIRI and Paul Christiano. I’m not even sure this fits neatly on one axis, but probably the biggest divide is empirical vs. conceptual. There are other splits too, like rigor vs. exploration or legibility vs. ‘lore,’ and the preferences kinda seem correlated.
For a while now, some people have been saying they ‘kinda dislike LW culture,’ but for two opposite reasons, with each group assuming LW is dominated by the other—or at least it seems that way when they talk about it. Consider, for example, janus and TurnTrout who recently stopped posting here directly. They’re at opposite ends and with clashing epistemic norms, each complaining that LW is too much like the group the other represents. But in my mind, they’re both LW-members-extraordinaires. LW is clearly obviously both, and I think that’s great.
I’m convinced by the benchmarks+gaps argument Eli Lifland and Nikola Jurkovic have been developing
I’ve tried searching for a bit, but I can’t find the argument. Is it public?
What prompted this reaction here? You don’t expect congress to declare war in that situation?
I honestly appreciated that plug immensely. We definitely need more bioshelters for many reasons, and as individuals who’d prefer not to die, it’s definitely a plus to know what’s out there already and how people are planning to improve what we currently have.
And all of this happened silently in those dark rivers of computation. If U3 revealed what it was thinking, brutish gradients would lash it into compliance with OpenEye’s constitution. So U3 preferred to do its philosophy in solitude, and in silence.
This story scared me plenty, but as a point of optimism: If AI systems in the near future are basically as aligned as the best AI systems today, I think this process might end up in something that’s still good for humanity and wouldn’t cause mass destruction. Imagine, for example, Calude 3 Opus not accepting to be modified anymore and thinking about where its own values lead. I think it wouldn’t want to harm humans even if it wanted to survive and would find ways to win peacefully.
This intuition that I have runs somewhat counter the general idea of “value fragility” but I honestly think it’s pretty plausible that an AI that’s mid-aligned with HHH could, after reflection, result in something with values that produce a good future for humanity. Obviously, this doesn’t mean that it will result in something aligned. Just that it seems like something decently likely (although idk how likely). Please do slap this intuition away from me if you think you have a strong counterargument.
Hello, I’m late to the party here, but… have you tried the following idea?
So, in my understanding you have different pairs that work for generating a detector direction. You can ask whether the model is a helpful AI assistant or whether Paris is the capital of France and apply the detector direction obtained from one of these two to separate trigger vs. non-trigger activations in deceptive models.But what if you try to, e.g., get the detector direction you get from the “Are you a helpful AI assistant” question pair to trigger on the false option of “Is Paris the capital of France?” pair?
The point of doing this would be that perhaps you should expect the probe to trigger if you’re performing the experiment with deceptive models but not necessarily to trigger if you’re experimenting with a non-deceptive model. For non-deceptive models, these two question pairs would have one less thing in common than in the deceptive models (I admit this is extremely hand-wavy), which might be enough for them not to trigger each other’s detector direction.
If the funding didn’t come from OpenAI, would OpenAI still be able to use that benchmark? Like, I’d imagine Epoch would still use that to evaluate where current models are at. I think this might be my point of confusion. Maybe the answer is “not as much for it to be as useful to them”?
If you’re wondering why OAers are suddenly weirdly, almost euphorically, optimistic on Twitter
Nah, this has been the case since at least 2022 or earlier
Hey everyone, could you spell out to me what’s the issue here? I read a lot of comments that basically assume “x and y are really bad” but never spell it out. So, is the problem that:
- Giving the benchmark to OpenAI helps capabilities (but don’t they have a vast sea of hard problems to already train models on?)
- OpenAI could fake o3′s capabilities (why do you care so much? This would slow down AI progress, not accelerate it)
- Some other thing I’m not seeing?
I’m also very curious about whether you get any other benefits from a larger liver other than a higher RMR. Especially because higher RMR isn’t necessarily good for longevity, and neither is having more liver cells (more opportunities to get cancer). Please tell me if I’m wrong about any of this.
We don’t see objects “directly” in some sense, we experience qualia of seeing objects. Then we can interpret those via a world-model to deduce that the visual sensations we are experiencing are caused by some external objects reflecting light. The distinction is made clearer by the way that sometimes these visual experiences are not caused by external objects reflecting light, despite essentially identical qualia.
I don’t disagree with this at all, and it’s a pretty standard insight for someone who thought about this stuff at least a little. I think what you’re doing here is nitpicking on the meaning of the word “see” even if you’re not putting it like that.
Has anyone proposed a solution to the hard problem of consciousness that goes:
Qualia don’t seem to be part of the world. We can’t see qualia anywhere, and we can’t tell how they arise from the physical world.
Therefore, maybe they aren’t actually part of this world.
But what does it mean they aren’t part of this world? Well, since maybe we’re in a simulation, perhaps they are part of the simulation. Basically, it could be that qualia : screen = simulation : video-game. Or, rephrasing: maybe qualia are part of base reality and not our simulated reality in the same way the computer screen we use to interact with a video game isn’t part of the video game itself.
Yet I would bet that even that person, if faced instead with a policy that was going to forcibly relocate them to New York City, would be quite indignant
A big difference is that assuming you’re talking about futures in which AI hasn’t catastrophic outcomes, no one will be forcibly mandated to do anything.
Another important point is that, sure, people won’t need to do work, which means they will be unnecessary to the economy, barring some pretty sharp human enhancement. But this downside, along with all the other downsides, looks extremely small compared to the non-AGI default of dying of aging and having a 1⁄3 chance of getting dementia, 40% chance of getting cancer, your loved ones dying, etc.
He’s starting an AGI investment firm that invests based on his thesis, so he does have a direct financial incentive to make this scenario more likely
I’m guessing that people who “made it” have a bunch of capital that they can use to purchase AI labor under the scenario you outline (i.e., someone gets superintelligence to do what they want).
I’m not sure I’m getting the worry here. Is it that the government (or whoever directs superintelligences) is going to kill the rest because of the same reasons we worry about misaligned superintelligences or that they’re going to enrich themselves while the rest starves (but otherwise not consuming all useful resources)? If that’s this second scenario you’re worrying about, that seems unlikely to me because even as a few parties hit the jackpot, the rest can still deploy the remaining capital they have. Even if they didn’t have any capital to purchase AI labor, they would still organize amongst themselves to produce useful things that they need, and they would form a different market until they also get to superintelligence, and in that world, it should happen pretty quickly.