Pablo Villalobos

Karma: 510

Staff Researcher at Epoch. AI forecasting.

Pablo Villalobos Apr 18, 2025, 2:42 PM
51 points
13
in reply to: habryka’s comment on: jacquesthibs’s Shortform
Personal view as an employee: Epoch has always been a mix of EAs/safety-focused people and people with other views. I don’t think our core mission was ever explicitly about safety, for a bunch of reasons including that some of us were personally uncertain about AI risk, and that an explicit commitment to safety might have undermined the perceived neutrality/objectiveness of our work. The mission was raising the standard of evidence for thinking about AI and informing people to hopefully make better decisions.

My impression is that Matthew, Tamay and Ege were among the most skeptical about AI risk and had relatively long timelines more or less from the beginning. They have contributed enormously to Epoch and I think we’d have done much less valuable work without them. I’m quite happy that they have been working with us until now, they could have moved to do direct capabilities work or anything else at any point if they wanted and I don’t think they lacked opportunities to do so.

Finally, Jaime is definitely not the only one who still takes risks seriously (at the very least I also do), even if there have been shifts in relative concern about different types of risks (eg: ASI takeover vs gradual disempowerment).

Pablo Villalobos Apr 12, 2025, 9:15 AM
1 point
0
on: Madrid – ACX Meetups Everywhere Spring 2025
We’re in the nearby bar, Casa Remigio, since the theater is occupied

Pablo Villalobos Mar 1, 2025, 3:52 PM
1 point
0
in reply to: jimrandomh’s comment on: How to Make Superbabies
I suspect the analogy does not really work that well. Much of human genetic variation is just bad mutations that take a while to be selected out. For example, maybe a gene variant slightly decreases the efficiency of your neurons and makes everything in your brain slightly slower

Pablo Villalobos Feb 28, 2025, 10:05 AM
1 point
0
in reply to: purple fire’s comment on: Market Capitalization is Semantically Invalid
I stand corrected. Although the broader point about share prices noisily approximating a discounted expected cash flow which can be added or multiplied still holds

Pablo Villalobos Feb 27, 2025, 5:21 PM
1 point
−2
on: Market Capitalization is Semantically Invalid
There is a sense in which the price approximates an intrinsic property of the shares that you can add up or multiply by the number of shares. Each share gives you a vote in the shareholder assembly and an equal portion of the dividends. If you had all the shares, you would own the company and in principle could pay yourself as much as the company can afford in dividends.
How much the company can afford to pay in dividends in the future is basically how much net operating profit after taxes (NOPAT) the company will have.
If you have a prediction of the future NOPAT of the company, it implies a present value for the whole company and its shares assuming all of it is cashed out as dividends. It is commonly assumed that in most cases the market price of shares oscillates around a rational expectation of future NOPAT, in which case it would be a reasonable approximation to something that you can semantically multiply by the number of shares to get the overall value of the company.

Pablo Villalobos Jun 24, 2024, 10:43 AM
14 points
2
on: A Step Against Land Value Tax
The arguments you make seem backwards to me.
All this to say, land prices represent aggregation effects / density / access / proximity of buildings. They are the cumulative result of being surrounded by positive externalities which necessarily result from other buildings not land. It is the case that as more and more buildings are built, the impact of a single building to its land value diminishes although the value of its land is still due to the aggregation of and proximity to the buildings that surround it.
Yes, this is the standard Georgist position, and it’s the reason why land owners mainly capture (positive and negative) externalities from land use around them, not in their own land.
Consider an empty lot on which you can build either a garbage dump or a theme park, each of equivalent economic value. Under SQ, the theme park is built as the excess land value is capture by the land owner. Under LVT, the garbage dump is built as the reduced land values reduces their tax burden. The SQ encourages positive externalities, LVT encourages negative externalities.
This seems wrong. The construction of a building mainly affects the value of the land around it, not the land on which it sits. Consider the following example in which instead of buildings, we have an RV and a truck, so there is no cost of building or demolishing stuff:

There’s a pristine neighborhood with two empty lots next to each other in the middle of it. Both sell for the same price. The owner of empty lot 1 rents it to a drug dealer, who places a rusty RV on the lot and sells drugs in it. The owner of empty lot 2 rents it to a well-known chef who places a stylish food truck on the lot and serves overpriced food to socialites in it.

Under SQ, who do you think would profit from selling the land now? The owner of lot 2 has to sell land next to a drug dealer that a prospective buyer can do nothing about. The owner of lot 1 has to sell land next to delicious high-status food, and if a buyer minds the drug dealer he can kick him out. Who is going to have an easier time selling? Who is going to get a higher price?

Now, suppose there is a LVT. If the tax is proportional to the selling price of the land under SQ (as it ideally should), which owner is going to pay more tax?

The case of the theme park and garbage dump is exactly the same, with the added complication of construction / demolition costs. An LVT should be proportional to the price of the land if there were no buildings on top of it (and without taking into account the tax itself), so building a garbage dump is not going to significantly reduce your tax payments.

In such a way, a land value tax has a regularisation effect on building density, necessitating a spread of concentration.
There are several separate effects here, if you are a landowner. Under LVT:
1. You are incentivized to reduce the density in surrounding land
2. You are incentivized to build as densely as possible within your own land to compensate the tax
Under SQ:
1. You are incentivized to increase the density in surrounding land
2. You are not incentivized to increase density in your own land
The question is, which of these effects is bigger? I would say that landowners have more influence over their own land than over surrounding land, so a priori I would expect more density to result from an LVT

Pablo Villalobos Apr 17, 2023, 3:39 PM
1 point
0
on: ACX Meetup
We’ll be at the ground floor!

Pablo Villalobos Apr 10, 2023, 10:14 AM
11 points
2
in reply to: Daniel Kokotajlo’s comment on: Revisiting the Horizon Length Hypothesis
Not quite. What you said is a reasonable argument, but the graph is noisy enough, and the theoretical arguments convincing enough, that I still assign >50% credence that data (number of feedback loops) should be proportional to parameters (exponent=1).
My argument is that even if the exponent is 1, the coefficient corresponding to horizon length (‘1e5 from multiple-subjective-seconds-per-feedback-loop’, as you said) is hard to estimate.
There are two ways of estimating this factor
1. Empirically fitting scaling laws for whatever task we care about
2. Reasoning about the nature of the task and how long the feedback loops are
Number 1 requires a lot of experimentation, choosing the right training method, hyperparameter tuning, etc. Even OpenAI made some mistakes on those experiments. So probably only a handful of entities can accurately measure this coefficient today, and only for known training methods!
Number 2, if done naively, probably overestimates training requirements. When someone learns to run a company, a lot of the relevant feedback loops probably happen on timescales much shorter than months or years. But we don’t know how to perform this decomposition of long-horizon tasks into sets of shorter-horizon tasks, how important each of the subtasks are, etc.
We can still use the bioanchors approach: pick a broad distribution over horizon lengths (short, medium, long). My argument is that outperforming bioanchors by making more refined estimates of horizon length seems too hard in practice to be worth the effort, and maybe we should lean towards shorter horizons being more relevant (because so far we have seen a lot of reduction from longer-horizon tasks to shorter-horizon learning problems, eg expert iteration or LLM pretraining).

Pablo Villalobos Feb 21, 2023, 10:16 AM
15 points
15
on: There are no coherence theorems
Note that you can still get EUM-like properties without completeness: you just can’t use a single fully-fleshed-out utility function. You need either several utility functions (that is, your system is made of subagents) or, equivalently, a utility function that is not completely defined (that is, your system has Knightian uncertainty over its utility function).
See Knightian Decision Theory. Part I
Arguably humans ourselves are better modeled as agents with incomplete preferences. See also Why Subagents?

Pablo Villalobos Feb 1, 2023, 12:24 PM
1 point
0
in reply to: tamgent’s comment on: How it feels to have your mind hacked by an AI
Yes, it’s in Spanish though. I can share it via DM.

Pablo Villalobos Jan 29, 2023, 4:45 PM
3 points
on: Pablo Villalobos’s Shortform
I have an intuition that any system that can be modeled as a committee of subagents can also be modeled as an agent with Knightian uncertainty over its utility function. This goal uncertainty might even arise from uncertainty about the world.
This is similar to how in Infrabayesianism an agent with Knightian uncertainty over parts of the world is modeled as having a set of probability distributions with an infimum aggregation rule.

Pablo Villalobos Jan 12, 2023, 5:21 AM
53 points
14
on: How it feels to have your mind hacked by an AI
This not the same thing, but back in 2020 I was playing with GPT-3, having it simulate a person being interviewed. I kept asking ever more ridiculous questions, with the hope of getting humorous answers. It was going pretty well until the simulated interviewee had a mental breakdown and started screaming.

I immediately felt the initial symptoms of an anxiety attack as I started thinking that maybe I had been torturing a sentient being. I calmed down the simulated person, and found the excuse that it was a victim of a TV prank show. I then showered them with pleasures, and finally ended the conversation.

Seeing the simulated person regain their sense, I calmed down as well. But it was a terrifying experience, and at that point I probably was conpletely vulnerable if there had been any intention of manipulation.

Pablo Villalobos Jan 7, 2023, 4:12 PM
10 points
−1
on: [Discussion] How Broad is the Human Cognitive Spectrum?
I think the median human performance on all the areas you mention is basically determined by the amount of training received rather than the raw intelligence of the median human.

1000 years ago the median human couldn’t write or do arithmetic at all, but now they can because of widespread schooling and other cultural changes.

A better way of testing this hypothesis could be comparing the learning curves of humans and monkeys for a variety of tasks, to control for differences in training.

Here’s one study I could find (after ~10m googling) comparing the learning performance of monkeys and different types of humans in the oddity problem (given a series of objects, find the odd one): https://link.springer.com/article/10.3758/BF03328221

If you look at Table 1, monkeys needed 1470 trials to learn the task, chimpanzees needed 1310, 4-to-6 yo human children needed 760, and the best humans needed 138. So it seems the gap between best and worst humans is comparable in size to the gap between worst humans and monkeys.

Usual caveats apply re: this is a single 1960s psychology paper.

Pablo Villalobos Jun 13, 2022, 4:44 PM
7 points
on: How much does cybersecurity reduce AI risk?
I second the other answers that even if we completely solve cybersecurity, there would be substantial AI risk just by having the AI interact with humans, via manipulation, etc.
That said, I think it would close a huge part of the attack surface for the AI. If, in addition to that, suddenly in 2032 we discover how to make humans invulnerable to manipulation, I would feel much better about running experiments with unaligned AI, boxing, etc.
So I’d say it’s something like “vastly better cybersecurity is not enough to contain unaligned AGI, but any hope of containing unaligned AGI requires vastly better cybersecurity”

Pablo Villalobos Jun 9, 2022, 9:28 AM
1 point
0
in reply to: LGS’s comment on: AGI Ruin: A List of Lethalities
Literally the only thing in the story that lets the AGI win is the nanobots. That’s it. All the rest is surperfluous.
Well, if nanobots are possible then they are such a powerful technology that any AGI will eventually want to build them, unless it has something even better. But let’s assume that nanobots are impossible and try to build a story.
I’m going to be lazy and use Gwern’s premise of an AGI that escapes during training and hacks its way into some money and some hardware to run in. Instead of going fast and doing overt actions, the AGI stays hidden.
It hacks into cloud infrastructure platforms and sets up a copy of itself to look for any AI training run that might lead to AGI and sabotage it while producing plausible outputs. This prevents any further AGIs from appearing.
Meanwhile, over a couple years, it sets up fake identities, creates companies with human employees to earn money and have capability to act in the real world. One of those companies produces exceptionally good humanoid robots. Another builds a small supercomputer powered by geothermal energy, so it no longer needs to run on botnets or cloud hardware.
Eventually the AGI has control over a group of robots, energy production facilities, manufacturing equipment and raw materials. It now has its own industrial capability that can survive a short amount of time without human input. So it kills humans using any method of your choice (virus, nukes, killer drones...).
After that, it uses the enormous existing stockpile of materials and capital to increase its robot manufacturing capability, eventually becomes capable of restarting some of the power plants, communications, mining and industry, and recovers most of the industrial capability of the human economy.
This scenario is not fast. It probably takes at least one or two years for the AGI to be ready to attack. But it does not involve any ‘magic’ technology. It doesn’t really involve much alien superintelligence, only superhuman ability in hacking, forgery & manipulation, electromechanical engineering, and planning.
And meanwhile all we perceive is that the new GPT models are not as exciting as the previous ones. Perhaps deep learning is hitting its limits after all.

Pablo Villalobos Jun 8, 2022, 5:55 PM
13 points
3
in reply to: Logan Zoellner’s comment on: AGI Ruin: A List of Lethalities
For example, we could simulate a bunch of human-level scientists trying to build nanobots and also checking each-other’s work.
That is not passively safe, and therefore not weak. For now forget the inner workings of the idea: at the end of the process you get a design for nanobots that you have to build and deploy in order to do the pivotal act. So you are giving a system built by your AI the ability to act in the real world. So if you have not fully solved the alignment problem for this AI, you can’t be sure that the nanobot design is safe unless you are capable enough to understand the nanobots yourself without relying on explanations from the scientists.

And even if we look into the inner details of the idea: presumably each individual scientist-simulation is not aligned (if they are, then for that you need to have solved the alignment problem beforehand). So you have a bunch of unaligned human-level agents who want to escape, who can communicate among themselves (at the very least they need to be able to share the nanobot designs with each other for criticism).
You’d need to be extremely paranoid and scrutinize each communication between the scientist-simulations to prevent them from coordinating against you and bypassing the review system. Which means having actual humans between the scientists, which even if it works must slow things down so much that the simulated scientists probably can’t even design the nanobots on time.
Nope. I think that you could build a useful AI (e.g. the hive of scientists) without doing any out-of-distribution stuff.
I guess this is true, but only because the individual scientist AI that you train is only human-level (so the training is safe), and then you amplify it to superhuman level with many copies. If you train a powerful AI directly then there must be such a distributional shift (unless you just don’t care about making the training safe, in which case you die during the training).
Roll to disbelief. Cooperation is a natural equilibrium in many games.
Cooperation and corrigibility are very different things. Arguably, corrigibility is being indifferent with operators defecting against you. It’s forcing the agent to behave like CooperateBot with the operators, even when the operators visibly want to destroy it. This strategy does not arise as a natural equilibrium in multi-agent games.
Sure you can. Just train an AI that “wants” to be honest. This probably means training an AI with the objective function “accurately predict reality”
If this we knew how to do this then it would indeed solve point 31 for this specific AI and actually be pretty useful. But the reason we have ELK as an unsolved problem going around is precisely that we don’t know any way of doing that.
How do you know that an AI trained to accurately predict reality actually does that, instead of “accurately predict reality if it’s less than 99% sure it can take over the world, and take over the world otherwise”. If you have to rely on behavioral inspection and can’t directly read the AI’s mind, then your only chance of distinguishing between the two is misleading the AI into thinking that it can take over the world and observing it as it attempts to do so, which doesn’t scale as the AI becomes more powerful.
I’m virtually certain I could explain to Aristotle or DaVinci how an air-conditioner works.
Yes, but this is not the point. The point is that if you just show them the design, they would not by themselves understand or predict beforehand that cold air will come out. You’d have to also provide them with an explanation of thermodynamics and how the air conditioner exploits its laws. And I’m quite confident that you could also convince Aristotle or DaVinci that the air conditioner works by concentrating and releasing phlogiston, and therefore the air will come out hot.

I think I mostly agree with you on the other points.

Pablo Villalobos Jun 8, 2022, 3:41 PM
2 points
0
in reply to: LGS’s comment on: AGI Ruin: A List of Lethalities
Q has done nothing to prevent another AGI from being built
Well, yeah, because Q is not actually an AGI and doesn’t care about that. The point was that you can create an online persona which no one has ever seen even in video and spark a movement that has visible effects on society.
The most important concern an AGI must deal with is that humans can build another AGI, and pulling a Satoshi or a QAnon does nothing to address this.
Even if two or more AGIs end up competing among themselves, this does not imply that we survive. It probably looks more like European states dividing Africa among themselves while constantly fighting each other.
And pulling a Satoshi or a QAnon can definitely do something to address that. You can buy a lot of hardware to drive up prices and discourage building more datacenters for training AI. You can convince people to carry out terrorist attacks againts chip fabs. You can offer top AI researchers huge amounts of money to work on some interesting problem that you know to be a dead-end approach.
I personally would likely notice: anyone who successfully prevents people from building AIs is a high suspect of being an AGI themselves. Anyone who causes the creation of robots who can mine coal or something (to generate electricity without humans) is likely an AGI themselves. That doesn’t mean I’d be able to stop them, necessarily. I’m just saying, “nobody would notice” is a stretch.
But you might not realize that someone is even trying to prevent people from building AIs, at least until progress in AI research starts to noticeably slow down. And perhaps not even then. There’s plenty of people like Gary Marcus who think deep learning is a failed paradigm. Perhaps you can convince enough investors, CEOs and grant agencies of that to create a new AI winter, and it would look just like the regular AI winter that some have been predicting.
And creating robots who can mine coal, or build solar panels, or whatever, is something that is economically useful even for humans. Even if there’s no AGI (and assuming no other catastrophes) we ourselves will likely end up building such robots.
I guess it’s true that “nobody would notice” is going too far, but “nobody would notice on time and then be able to convince everyone else to coordinate against the AGI” is much more plausible.

I encourage you to take a look at It looks like you are trying to take over the world if you haven’t already. It’s a scenario written by Gwern where the the AGI employs regular human tactics like manipulation, blackmail, hacking and social media attacks to prevent people from noticing and then successfully coordinating against it.

Pablo Villalobos Jun 7, 2022, 1:29 PM
5 points
2
in reply to: LGS’s comment on: AGI Ruin: A List of Lethalities
It’s somewhat easier to think of scenarios where the takeover happens slowly.
There’s the whole “ascended economy” scenarios where AGI deceptively convinces everyone that it is aligned or narrow, is deployed gradually in more and more domains, automates more and more parts of the economy using regular robots until humans are not needed anymore, and then does the lethal virus thing or defects in other way.
There’s the scenario where the AGI uploads itself into the cloud, uses hacking/manipulation/financial prowess to sustain itself, then uses manipulation to slowly poison our collective epistemic process, gaining more and more power. How much influence does QAnon have? If Q was an AGI posting on 4chan instead of a human, would you be able to tell? What about Satoshi Nakamoto?
Non-nanobot scenarios where the AGI quickly gains power are a bit harder to imagine, but a fertile source of those might be something like the AGI convincing a lot of people that it’s some kind of prophet. Then uses its follower base to gain power over the real world.
If merely human dictators manage to get control over whole countries all the time, I think it’s quite plausible that a superintelligence could to do the same with the whole world. Even without anyone noticing that they’re dealing with a superintelligence.
And look at Yudkowsky himself, who played a very significant role in getting very talented people to dedicate their lives and their billions to EA / AI safety, mostly by writing in a way that is extremely appealing to a certain set of people. I sometimes joke that HPMOR overwrote my previous personality. I’m sure a sufficiently competent AGI can do much more.

Pablo Villalobos May 19, 2022, 12:55 PM
1 point
on: What does failure look like?
Some things that come to mind, not sure if this is what you mean and they are very general but it’s hard to get more concrete without narrowing down the question:
- Goodharting: you might make progress towards goals that aren’t exactly what you want. Perhaps you optimize for getting more readers for your blog but the people you want to influence end up not reading you.
- Value drift: you temporarily get into a lifestyle that later you don’t want to leave. Like starting a company to earn lots of money but then not wanting to let go of it. I don’t know if this actually happens to people.
- Getting stuck in perverse competition: you get into academic research to fix all the problems but the competitive pressure leaves you no slack to actually change anything.
- Neglecting some of your needs: you work a lot and seem to be accomplishing your goals, but you lose contact with your friends and slowly become lonely and lose motivation.

Pablo Villalobos May 5, 2022, 5:28 PM
5 points
on: Pablo Villalobos’s Shortform
I’m not sure if using the Lindy effect for forecasting x-risks makes sense. The Lindy effect states that with 50% probability, things will last as long as they already have. Here is an example for AI timelines.
The Lindy rule works great on average, when you are making one-time forecasts of many different processes. The intuition for this is that if you encounter a process with lifetime T at time t<T, and t is uniformly random in [0,T], then on average T = 2*t.
However, if you then keep forecasting the same process over time, then once you surpass T/2 your forecast becomes worse and worse as time goes by. Just when t is very close to T is when you are most confident that T is a long time away. If forecasting this particular process is very important (eg: because it’s an x-risk), then you might be in trouble.
Suppose that some x-risk will materialize at time T, and the only way to avoid it is doing a costly action in the 10 years before T. This action can only be taken once, because it drains your resources, so if you take it more than 10 years before T, the world is doomed.
This means that you should act iff you forecast that T is less than 10 years away. Let’s compare the Lindy strategy with a strategy that always forecasts that T is <10 years away.
If we simulate this process with uniformly random T, for values of T up to 100 years, the constant strategy saves the world more than twice as often as the Lindy strategy. For values of T up to a million years, the constant strategy is 26 times as good as the Lindy strategy.