Ivan Vendrov

Karma: 1,000

Ivan Vendrov Apr 10, 2025, 4:07 AM
2 points
0
in reply to: abramdemski’s comment on: Ivan Vendrov’s Shortform
Minor points just to get them out of the way:
1. I think Bayesian optimization still makes sense with infinite compute if you have limited data (infinite compute doesn’t imply perfect knowledge, you still have to run experiments in the world outside of your computer).
2. The reason I specified evolutionary search is because that’s the claim I see Lehman & Stanley as making—that algorithms pursuing simple objectives tend to not be robust in an evolutionary sense. I’m less confident making claims about broader classes of optimization but not intentionally excluding them
Meta point: it feels like we’re bouncing between incompatible and partly-specified formalisms before we even know what the high level worldview diff is.
To that end, I’m curious what you think the implications of the Lehman & Stanley hypothesis would be—supposing it were shown even for architectures that allow planning to search, which I agree their paper does not do. So yes you can trivially exhibit a “goal-oriented search over good search policies” that does better than their naive novelty search, but what if it turns out a “novelty-oriented search over novelty-oriented search policies” does better still? Would this be a crux for you, or is this not even a coherent hypothetical in your ontology of optimization?

Ivan Vendrov Apr 10, 2025, 3:42 AM
2 points
0
in reply to: gwern’s comment on: Ivan Vendrov’s Shortform
“harness” is doing a lot of work there. If incoherent search processes are actually superior then VNM agents are not the type of pattern that is evolutionary stable, so no “harnessing” is possible in the long term, more like a “dissolving into”.
Unless you’re using “VNM agent” to mean something like “the definitionally best agent”, in which case sure, but a VNM agent is a pretty precise type of algorithm defined by axioms that are equivalent to saying it is perfectly resistant to being Dutch booked.
Resistance to Dutch booking is cool, seems valuable, but not something I’d spent limited compute resources on getting six nines of reliability on. Seems like evolution agrees, so far: the successful organisms we observe in nature, from bacteria to humans, are not VNM agents and in fact are easily Dutch booked. The question is whether this changes as evolution progresses and intelligence increases.

Ivan Vendrov Apr 8, 2025, 4:07 PM
2 points
0
in reply to: abramdemski’s comment on: Ivan Vendrov’s Shortform
I agree Bayesian optimization should win out given infinite compute, but what makes you confident that evolutionary search under computational resource scarcity selects for anything like an explicit Bayesian optimizer or long term planner? (I say “explicit” because the Bayesian formalism has enough free parameters that you can post-hoc recast ~any successful algorithm as an approximation to a Bayesian ideal)

Ivan Vendrov’s Shortform

Ivan VendrovApr 8, 2025, 2:16 PM

6 points

9 comments LW link

Ivan Vendrov Apr 8, 2025, 2:16 PM
2 points
−2
on: Ivan Vendrov’s Shortform
Are instrumental convergence & Omohundro drives just plain false? If Lehman and Stanley are right in “Novelty Search and the Problem with Objectives” (https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/lehmanNoveltySearch11.pdf) later popularized in their book “Why Greatness Cannot Be Planned”, VNM-coherent agents that pursue goal stability will reliably be outcompeted by incoherent search processes pursuing novelty.

Ivan Vendrov Feb 7, 2025, 9:27 PM
8 points
2
on: Racing Towards Fusion and AI
Great, thought-provoking post. The AI research community certainly felt much more cooperative before it got an injection of startup/monopoly/winner-take-all thinking. Google Brain publishing the Transformer paper being a great example.
I wonder how much this truly is narrative, as opposed to AI being genuinely more winner-take-all than fusion in the economic sense. Certainly the hardware layer has proven quite winner-take-all so far with NVDA taking a huge fraction of the profit; same with adtech, the most profitable application of (last-generation) AI, where network effects and first mover advantages have led to the dominance of a couple of companies.
Global foundation model development efforts being pooled into an international consortium like ITER or CERN seems quite good to me in terms of defusing race dynamics. Perhaps we will get there in a few years if private capital loses interest in funding 100B+ training runs.

Ivan Vendrov Jan 22, 2025, 5:24 AM
0 points
−1
in reply to: Ashin’s comment on: Passages I Highlighted in The Letters of J.R.R.Tolkien
I think writing one of the best selling books of your century is extraordinary evidence you’ve understood something deep about human nature, which is more than most random rationalist bloggers can claim. but yes doesn’t imply you have a coherent philosophy or benevolent political program

Ivan Vendrov Dec 30, 2024, 12:02 AM
3 points
0
in reply to: Stag’s comment on: Shallow review of technical AI safety, 2024
cuts off some nuance, I would call this the projection of the collective intelligence agenda onto the AI safety frame of “eliminate the risk of very bad things happening” which I think is an incomplete way of looking at how to impact the future
in particular I tend to spend more time thinking about future worlds that are more like the current one in that they are messy and confusing and have very terrible and very good things happening simultaneously and a lot of the impact of collective intelligence tech (for good or ill) will determine the parameters of that world

Ivan Vendrov Dec 29, 2024, 1:13 PM
6 points
2
on: Shallow review of technical AI safety, 2024
Thanks, this is a really helpful broad survey of the field. Would be useful to see a one-screen-size summary, perhaps a table with the orthodox alignment problems as one axis?
I’ll add that the collective intelligence work I’m doing is not really “technical AI safety” but is directly targeted at orthodox problems 11. Someone else will deploy unsafe superintelligence first and 13. Fair, sane pivotal processes, and targeting all alignment difficulty worlds not just the optimistic one (in particular, I think human coordination becomes more not less important in the pessimistic world). I write more of how I think about pivotal processes in general in AI Safety Endgame Stories but it’s broadly along the lines of von Neumann’s
For progress there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment.

Ivan Vendrov Nov 26, 2024, 4:53 PM
11 points
4
in reply to: Lao Mein’s comment on: Passages I Highlighted in The Letters of J.R.R.Tolkien
I find that surprising, do you care to elaborate? I don’t think his worldview is complete, but he cares deeply about a lot of things I value too, which modern society seems not to value. I would certainly be glad to have him in my moral parliament.

Ivan Vendrov Nov 26, 2024, 4:15 PM
8 points
3
in reply to: cousin_it’s comment on: Passages I Highlighted in The Letters of J.R.R.Tolkien
Feels connected to his distrust of “quick, bright, standardized, mental processes”, and the obsession with language. It’s like his mind is relentlessly orienting to the territory, refusing to accept anyone else’s map. Which makes it harder to be a student but easier to discover something new. Reminds me of Geoff Hinton’s advice to not read the literature before engaging with the problem yourself.

I, Token

Ivan VendrovNov 25, 2024, 2:20 AM

14 points

2 comments3 min readLW link

(nothinghuman.substack.com)

Passages I Highlighted in The Letters of J.R.R.Tolkien

Ivan VendrovNov 25, 2024, 1:47 AM

139 points

38 comments31 min readLW link

Ivan Vendrov Oct 28, 2024, 4:09 AM
29 points
9
on: The hostile telepaths problem
I like this a lot! A few scattered thoughts
- This theory predicts and explains “therapy-resistant dissociation”, or the common finding that none of the “woo” exercises like focusing, meditation, etc, actually work. (c.f. Scott’s experience as described in https://www.astralcodexten.com/p/are-woo-non-responders-defective). If there’s an active strategy of self-deception, you’d expect people to react negatively (or learn to not react via yet deeper levels of self-deception) to straightforward attempts to understand and untangle one’s psychology.
- It matches and extends Robert Trivers’ theory of self-deception, wherein he predicts that when your mind is the site of a conflict between two sub-parts, the winning one will always be subconscious, because the conscious mind is visible to the subconscious but not vice versa, and being visible makes you weak. Thus, counterintuitively, the mind we are conscious of—in your phrase the false self—is always the losing part.
- It connects to a common question I have for people doing meditation seriously—why exactly do you want to make the subconscious conscious? Why is it such a good thing to “become more conscious”? Now I can make the question more precise—why do you think it’s safe to have more access to your thoughts and feelings than your subconscious gave you? And how exactly do you plan to deal with all the hostile telepaths out there (possibly including parts of yourself?). I expect most people find themselves dealing with (partly) hostile telepaths all the time, and so Occlumency is genuinely necessary unless one lives in an extraordinarily controlled environment such as a monastery.
- Social deception games like Avalon or Diplomacy provide a fertile ground for self- and group experimentation with the ideas in this essay.

Ivan Vendrov Oct 18, 2024, 8:51 PM
1 point
0
in reply to: zhukeepa’s comment on: Secular interpretations of core perennialist claims
I know this isn’t the central point of your life reviews section but curious if your model has any lower bound on life review timing—if not minutes to hours, at least seconds? milliseconds? (1 ms being a rough lower bound on the time for a signal to travel between two adjacent neurons).
If it’s at least milliseconds it opens the strange metaphysical possibility of certain deaths (e.g. from very intense explosions) being exempt from life reviews.

Ivan Vendrov Aug 31, 2024, 9:08 AM
8 points
0
on: Extended Interview with Zhukeepa on Religion
Really appreciated this exchange, Ben & Alex have rare conversational chemistry and ability to sense-make productively at the edge of their world models.
I mostly agree with Alex on the importance of interfacing with extant institutional religion, though less sure that one should side with pluralists over exclusivists. For example, exclusivist religious groups seem to be the only human groups currently able to reproduce themselves, probably because exclusivism confers protection against harmful memes and cultural practices.
I’m also pursuing the vision of a decentralized singleton as alternative to Moloch or turnkey totalitarianism, although it’s not obvious to me how the psychological insights of religious contemplatives are crucial here, rather than skilled deployment of social technology like the common law, nation states, mechanism design, cryptography, recommender systems, LLM-powered coordination tools, etc. Is there evidence that “enlightened” people, for some sense of “enlightened” are in fact better at cooperating with each other at scale?
If we do achieve existential security through building a stable decentralized singleton, it seems much more likely that it would be the result of powerful new social tech, rather than the result of intervention on individual psychology. I suppose it could be the result of both with one enabling the other, like the printing press enabling the Reformation.

Ivan Vendrov Aug 28, 2024, 3:35 PM
5 points
1
in reply to: habryka’s comment on: O O’s Shortform
definitely agree there’s some power-seeking equivocation going on, but wanted to offer a less sinister explanation from my experiences in AI research contexts. Seems that a lot of equivocation and blurring of boundaries comes from people trying to work on concrete problems and obtain empirical information. a thought process like
1. alignment seems maybe important?
2. ok what experiment can I set up that lets me test some hypotheses
3. can’t really test the long-term harms directly, let me test an analogue in a toy environment or on a small model, publish results
4. when talking about the experiments, I’ll often motivate them by talking about long-term harm
Not too different from how research psychologists will start out trying to understand the Nature of Mind and then run a n=20 study on undergrads because that’s what they had budget for. We can argue about how bad this equivocation is for academic research, but it’s a pretty universal pattern and well-understood within academic communities.
The unusual thing in AI is that researchers have most of the decision-making power in key organizations, so these research norms leak out into the business world, and no-one bats an eye at a “long-term safety research” team that mostly works on toy and short term problems.
This is one reason I’m more excited about building up “AI security” as a field and hiring infosec people instead of ML PhDs. My sense is that the infosec community actually has good norms for thinking about and working on things-shaped-like-existential-risks, and the AI x-risk community should inherit those norms, not the norms of academic AI research.

Ivan Vendrov Aug 28, 2024, 2:57 PM
1 point
0
in reply to: gwern’s comment on: Bing Chat is blatantly, aggressively misaligned
by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn’t be a ‘warning shot’, it’d just be a ‘shot’ or ‘disaster’.
Yours is the more direct definition but from context I at least understood ‘warning shot’ to mean ‘disaster’, on the scale of a successful terrorist attack, where the harm is large and undeniable and politicians feel compelled to Do Something Now. The ‘warning’ is not of harm but of existential harm if the warning is not heeded.
I do still expect such a warning shot, though as you say it could very well be ignored even if there are large undeniable harms (e.g. if a hacker group deploys a rogue AI that causes a trillion dollars of damage, we might take that as warning about terrorism or cybersecurity not about AI).

Ivan Vendrov Jul 22, 2024, 11:22 PM
4 points
2
on: Coalitional agency
Agreed that coalitional agency is somehow more natural than squiggly-optimizer agency. Besides people, another class of examples are historical empires (like the Persian and then Roman) which were famously lenient ^[1] and respectful of local religious and cultural traditions; i.e. optimized coalition builders that offered goal-stability guarantees to their subagent communities, often stronger guarantees than those communities could expect by staying independent.
This extends my argument in Cooperators are more powerful than agents—in a world of hierarchical agency, evolution selects not for world-optimization / power-seeking but for cooperation, which looks like coalition-building (negotiation?) at the higher levels of organization and coalition-joining (domestication?) at the lower levels.

I don’t see why this tendency should break down at higher levels of intelligence, if anything it should get stronger as power-seeking patterns are detected early and destroyed by well-coordinated defensive coalitions. There’s still no guarantee that coalitional superintelligence will respect “human values” any more than we respect the values of ants; but contra Yudkowsky-Bostrom-Omohundro doom is not the default outcome.
1. ^
  if you surrendered!

Ivan Vendrov Jul 1, 2024, 4:31 PM
25 points
4
in reply to: Vladimir_Nesov’s comment on: Habryka’s Shortform Feed
Correct, I was not offered such paperwork nor any incentives to sign it. Edited my post to include this.

Ivan Vendrov

Ivan Ven­drov’s Shortform

I, Token

Pas­sages I High­lighted in The Let­ters of J.R.R.Tolkien

Ivan Vendrov’s Shortform

Passages I Highlighted in The Letters of J.R.R.Tolkien