Ivan Vendrov

Karma: 1,043

Ivan Vendrov Aug 31, 2024, 9:08 AM
8 points
0
on: Extended Interview with Zhukeepa on Religion
Really appreciated this exchange, Ben & Alex have rare conversational chemistry and ability to sense-make productively at the edge of their world models.
I mostly agree with Alex on the importance of interfacing with extant institutional religion, though less sure that one should side with pluralists over exclusivists. For example, exclusivist religious groups seem to be the only human groups currently able to reproduce themselves, probably because exclusivism confers protection against harmful memes and cultural practices.
I’m also pursuing the vision of a decentralized singleton as alternative to Moloch or turnkey totalitarianism, although it’s not obvious to me how the psychological insights of religious contemplatives are crucial here, rather than skilled deployment of social technology like the common law, nation states, mechanism design, cryptography, recommender systems, LLM-powered coordination tools, etc. Is there evidence that “enlightened” people, for some sense of “enlightened” are in fact better at cooperating with each other at scale?
If we do achieve existential security through building a stable decentralized singleton, it seems much more likely that it would be the result of powerful new social tech, rather than the result of intervention on individual psychology. I suppose it could be the result of both with one enabling the other, like the printing press enabling the Reformation.

Ivan Vendrov Aug 28, 2024, 3:35 PM
5 points
1
in reply to: habryka’s comment on: O O’s Shortform
definitely agree there’s some power-seeking equivocation going on, but wanted to offer a less sinister explanation from my experiences in AI research contexts. Seems that a lot of equivocation and blurring of boundaries comes from people trying to work on concrete problems and obtain empirical information. a thought process like
1. alignment seems maybe important?
2. ok what experiment can I set up that lets me test some hypotheses
3. can’t really test the long-term harms directly, let me test an analogue in a toy environment or on a small model, publish results
4. when talking about the experiments, I’ll often motivate them by talking about long-term harm
Not too different from how research psychologists will start out trying to understand the Nature of Mind and then run a n=20 study on undergrads because that’s what they had budget for. We can argue about how bad this equivocation is for academic research, but it’s a pretty universal pattern and well-understood within academic communities.
The unusual thing in AI is that researchers have most of the decision-making power in key organizations, so these research norms leak out into the business world, and no-one bats an eye at a “long-term safety research” team that mostly works on toy and short term problems.
This is one reason I’m more excited about building up “AI security” as a field and hiring infosec people instead of ML PhDs. My sense is that the infosec community actually has good norms for thinking about and working on things-shaped-like-existential-risks, and the AI x-risk community should inherit those norms, not the norms of academic AI research.

Ivan Vendrov Aug 28, 2024, 2:57 PM
1 point
0
in reply to: gwern’s comment on: Bing Chat is blatantly, aggressively misaligned
by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn’t be a ‘warning shot’, it’d just be a ‘shot’ or ‘disaster’.
Yours is the more direct definition but from context I at least understood ‘warning shot’ to mean ‘disaster’, on the scale of a successful terrorist attack, where the harm is large and undeniable and politicians feel compelled to Do Something Now. The ‘warning’ is not of harm but of existential harm if the warning is not heeded.
I do still expect such a warning shot, though as you say it could very well be ignored even if there are large undeniable harms (e.g. if a hacker group deploys a rogue AI that causes a trillion dollars of damage, we might take that as warning about terrorism or cybersecurity not about AI).

Ivan Vendrov Jul 22, 2024, 11:22 PM
4 points
2
on: Coalitional agency
Agreed that coalitional agency is somehow more natural than squiggly-optimizer agency. Besides people, another class of examples are historical empires (like the Persian and then Roman) which were famously lenient ^[1] and respectful of local religious and cultural traditions; i.e. optimized coalition builders that offered goal-stability guarantees to their subagent communities, often stronger guarantees than those communities could expect by staying independent.
This extends my argument in Cooperators are more powerful than agents—in a world of hierarchical agency, evolution selects not for world-optimization / power-seeking but for cooperation, which looks like coalition-building (negotiation?) at the higher levels of organization and coalition-joining (domestication?) at the lower levels.

I don’t see why this tendency should break down at higher levels of intelligence, if anything it should get stronger as power-seeking patterns are detected early and destroyed by well-coordinated defensive coalitions. There’s still no guarantee that coalitional superintelligence will respect “human values” any more than we respect the values of ants; but contra Yudkowsky-Bostrom-Omohundro doom is not the default outcome.
1. ^
  if you surrendered!

Ivan Vendrov Jul 1, 2024, 4:31 PM
25 points
4
in reply to: Vladimir_Nesov’s comment on: Habryka’s Shortform Feed
Correct, I was not offered such paperwork nor any incentives to sign it. Edited my post to include this.

Ivan Vendrov Jul 1, 2024, 5:33 AM
140 points
12
in reply to: Bird Concept’s comment on: Habryka’s Shortform Feed
I left Anthropic in June 2023 and am not under any such agreement.
EDIT: nor was any such agreement or incentive offered to me.
What links here?
- RobertM's comment on Habryka’s Shortform Feed by habryka (Jul 4, 2024, 7:18 PM; 37 points)

Ivan Vendrov Jun 10, 2024, 1:25 PM
1 point
0
in reply to: FlorianH’s comment on: Searching for the Root of the Tree of Evil
1. Agree trust and cooperation is dual use, and I’m not sure how to think about this yet; perhaps the most important form of coordination is the one that prevents (directly or via substitution) harmful forms of coordination from arising.
2. One reason I wouldn’t call lack of altruism the root is that it’s not clear how to intervene on it, it’s like calling the laws of physics the root of all evil. I prefer to think about “how to reduce transaction costs to self-interested collaboration”. I’m also less sure that a society of people more altruistic motives will necessarily do better… the nice thing about self-interest is that your degree of care is proportional to your degree of knowledge about the situation. A society of extremely altruistic people who are constantly devoting resources to solve what they believe to be other people’s problems may actually be less effective at ensuring flourishing.

Ivan Vendrov Jun 10, 2024, 1:18 PM
8 points
7
in reply to: Vaughn Papenhausen’s comment on: Searching for the Root of the Tree of Evil
You’re right the conclusion is quite underspecified—how exactly do we build such a cooperation machine?

I don’t know yet, but my bet is more on engineering, product design, and infrastructure than on social science. More like building a better Reddit or Uber (or supporting infrastructure layers like WWW and the Internet) than like writing papers.

Ivan Vendrov Jun 10, 2024, 1:13 PM
1 point
0
in reply to: Canaletto’s comment on: Searching for the Root of the Tree of Evil
would to love to see this idea worked out a little more!

Ivan Vendrov Nov 25, 2022, 5:40 PM
1 point
0
on: Guardian AI (Misaligned systems are all around us.)
I like the “guardian” framing a lot! Besides the direct impact on human flourishing, I think a substantial fraction of x-risk comes from the deployment of superhumanly persuasive AI systems. It seems increasingly urgent that we deploy some kind of guardian technology that at least monitors, and ideally protects, against such superhuman persuaders.

Ivan Vendrov Oct 22, 2022, 3:51 PM
3 points
0
in reply to: mako yass’s comment on: Cooperators are more powerful than agents
Symbiosis is ubiquitous in the natural world, and is a good example of cooperation across what we normally would consider entity boundaries.
When I say the world selects for “cooperation” I mean it selects for entities that try to engage in positive-sum interactions with other entities, in contrast to entities that try to win zero-sum conflicts (power-seeking).
Agreed with the complicity point—as evo-sim experiments like Axelrod’s showed us, selecting for cooperation requires entities that can punish defectors, a condition the world of “hammers” fails to satisfy.

Ivan Vendrov Oct 4, 2022, 2:03 AM
1 point
0
in reply to: Krieger’s comment on: Any further work on AI Safety Success Stories?
Depends on offense-defense balance, I guess. E.g. if well-intentioned and well-coordinated actors are controlling 90% of AI-relevant compute then it seems plausible that they could defend against 10% of the compute being controlled by misaligned AGI or other bad actors—by denying them resources, by hardening core infrastructure, via MAD, etc.

Ivan Vendrov Oct 3, 2022, 9:35 PM
1 point
0
in reply to: Krieger’s comment on: Any further work on AI Safety Success Stories?
I would be interested in a detailed analysis of pivotal act vs gradual steering; my intuition is that many of the differences dissolve once you try to calculate the value of specific actions. Some unstructured thoughts below:
1. Both aim to eventually end up in a state of existential security, where nobody can ever build an unaligned AI that destroys the world. Both have to deal with the fact that power is currently broadly distributed in the world, so most plausible stories in which we end up with existential security will involve the actions of thousands if not millions of people, distributed over decades or even centuries.
2. Pivotal acts have stronger claims of impact, but generally have weaker claims of the sign of that impact—actually realistic pivotal-seeming acts like “unilaterally deploy a friendly-seeming AI singleton” or “institute a stable global totalitarianism” are extremely, existentially dangerous. If someone identifies a pivotal-seeming act that is actually robustly positive, I’ll be the first to sign on.
3. In contrast, gradual steering proposals like “improve AI lab communication” or “improve interpretability” have weaker claims to impact, but stronger claims to being net positive across many possible worlds, and are much less subject to multi-agent problems like races and the unilateralist’s curse.
4. True, complete existential safety probably requires some measure of “solving politics” and locking in current human values, hence may not be desirable. Like what if the Long Reflection decides that the negative utilitarians are right and the world should in fact be destroyed? I won’t put high credence on that, but there is some level of accidental existential risk that we should be willing to accept in order to not lock in our values.

Ivan Vendrov Oct 3, 2022, 4:27 PM
3 points
0
on: Any further work on AI Safety Success Stories?
You might find AI Safety Endgame Stories helpful—I wrote it last week to try to answer this exact question, covering a broad array of (mostly non-pivotal-act) success stories from technical and non-technical interventions.
Nate’s “how various plans miss the hard bits of the alignment challenge” might also be helpful as it communicates the “dynamics of doom” that success stories have to fight against.
One thing I would love is to have a categorization of safety stories by claims about the world. E.g what does successful intervention look like in worlds where one or more of the following claims hold:
- No serious global treaties on AI ever get signed.
- Deceptive alignment turns out not to be a problem.
- Mechanistic interpretability becomes impractical for large enough models.
- CAIS turns out to be right, and AI agents simply aren’t economically competitive.
- Multi-agent training becomes the dominant paradigm for AI.
- Due to a hardware / software / talent bottleneck there turns out to be one clear AI capabilities leader with nobody else even close.
These all seem like plausible worlds to me, and it would be great if we had more clarity about what worlds different interventions are optimizing for. Ideally we should have bets across all the plausible worlds in which intervention is tractable, and I think that’s currently far from being true.

Ivan Vendrov Sep 29, 2022, 5:06 PM
1 point
0
in reply to: Noosphere89’s comment on: AI Safety Endgame Stories
I don’t mean to suggest “just supporting the companies” is a good strategy, but there are promising non-power-seeking strategies like “improve collaboration between the leading AI labs” that I think are worth biasing towards.
Maybe the crux is how strongly capitalist incentives bind AI lab behavior. I think none of the currently leading AI labs (OpenAI, DeepMind, Google Brain) are actually so tightly bound by capitalist incentives that their leaders couldn’t delay AI system deployment by at least a few months, and probably more like several years, before capitalist incentives in the form of shareholder lawsuits or new entrants that poach their key technical staff have a chance to materialize.

Ivan Vendrov Sep 29, 2022, 4:04 AM
3 points
2
in reply to: Paul Tiplady’s comment on: AI Safety Endgame Stories
Interesting, I haven’t seen anyone write about hardware-enabled attractor states but they do seem very promising because of just how decisive hardware is in determining which algorithms are competitive. An extreme version of this would be specialized hardware letting CAIS outcompete monolithic AGI. But even weaker versions would lead to major interpretability and safety benefits.

Ivan Vendrov Sep 29, 2022, 3:12 AM
1 point
0
in reply to: Noosphere89’s comment on: AI Safety Endgame Stories
Fabricated options are products of incoherent thinking; what is the incoherence you’re pointing out with policies that aim to delay existential catastrophe or reduce transaction costs between existing power centers?

Ivan Vendrov Sep 28, 2022, 2:06 AM
8 points
0
on: Why we’re not founding a human-data-for-alignment org
I’ve considered starting an org that was either aimed at generating better alignment data or would do so as a side effect and this is really helpful—this kind of negative information is nearly impossible to find.
Is there a market niche for providing more interactive forms of human feedback, where it’s important to have humans tightly in the loop with an ML process, rather than “send a batch to raters and get labels back in a few hours”? One reason RLHF is so little used is the difficulty of setting up this kind of human-in-the-loop infrastructure. Safety approaches like debate, amplification and factored cognition could also become competitive much faster if it was easier and faster to get complex human-in-the-loop pipelines running.
Maybe Surge already does this? But if not, you wouldn’t necessarily want to compete with them on their core competency of recruiting and training human raters. Just use their raters (or Scale’s), and build good reusable human-in-the-loop infrastructure, or maybe novel user interfaces that improve supervision quality.

Ivan Vendrov 24 Sep 2022 5:24 UTC
6 points
1
on: Why Do AI researchers Rate the Probability of Doom So Low?
I think a substantial fraction of ML researchers probably agree with Yann LeCun that AI safety will be solved “by default” in the course of making the AI systems useful. The crux is probably related to questions like how competent society’s response will be, and maybe the likelihood of deceptive alignment.
Two points of disagreement though:
- I don’t think setting P(doom) = 10% indicates lack of engagement or imagination; Toby Ord in the Precipice also gives a 10% estimate for AI-derived x-risk this century, and I assume he’s engaged pretty deeply with the alignment literature.
- I don’t think P(doom) = 10% or even 5% should be your threshold for “taking responsibility”. I’m not sure I like the responsibility frame in general, but even a 1% chance of existential risk is big enough to outweigh almost any other moral duty in my mind.

Ivan Vendrov 31 Aug 2022 16:49 UTC
LW: 3 AF: 2
0
AF
in reply to: evhub’s comment on: How likely is deceptive alignment?
Thank you for putting numbers on it!
~60%: there will be an existential catastrophe due to deceptive alignment specifically.
Is this an unconditionally prediction of 60% chance of existential catastrophe due to deceptive alignment alone? In contrast to the commonly used 10% chance of existential catastrophe due to all AI sources this century. Or do you mean that, conditional on there being an existential catastrophe due to AI, 60% chance it will be caused by deceptive alignment, and 40% by other problems like misuse or outer alignment?