“You are in love with Intelligence, until it frightens you. For your ideas are terrifying and your hearts are faint.”
Stephen Fowler
“Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth.”
Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky’s point about trying to sell an Oreo for $77 is that a billionaire isn’t automatically going to want to buy something off you if they don’t care about it (and neither would an ASI).
”I’m simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they’re value aligned. This is a claim that I don’t think has been established with any reasonable degree of rigor.”
I completely agree but I’m not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.
I previously think I overvalued the model in which laziness/motivation/mood are primarily internal states that required internal solutions. For me, this model also generated a lot of guilt because failing to be productive was a personal failure.
But is the problem a lack of “willpower” or is your brain just operating sub-optimally because you’re making a series of easily fixable health blunders?
Are you eating healthy?
Are you consuming large quantities of sugar?
Are you sleeping with your phone on your bedside table?
Are you deficient in any vitamins?
Is you sleep trash because you have been consuming alcohol?
Are you waking up at a consistent time?
Are you doing at least some exercise?I find time spent addressing this and other similar deficits is usually more productive than trying to think your way out of a laziness spiral.
None of this is medical advice. My experience may not be applicable to you. Do your own research. I ate half a tub of ice cream 30 minutes ago.
seems like big step change in its ability to reliably do hard tasks like this without any advanced scaffolding or prompting to make it work.
Keep in mind that o1 is utilising advanced scaffolding to facilitate Chain-Of-Thought reasoning, but it is just hidden from the user.
I’d like access to it.
I agree that the negative outcomes from technological unemployment do not get enough attention but my model of how the world will implement Transformative AI is quite different to yours.
Our current society doesn’t say “humans should thrive”, it says “professional humans should thrive”
Let us define workers to be the set of humans whose primary source of wealth comes from selling their labour. This is a very broad group that includes people colloquially called working class (manual labourers, baristas, office workers, teachers etc) but we are also including many people who are well remunerated for their time, such as surgeons, senior developers or lawyers.
Ceteris paribus, there is a trend that those who can perform more valuable, difficult and specialised work can sell their labour at a higher value. Among workers, those who earn more are usually “professionals”. I believe this is essentially the same point you were making.
However, this is not a complete description of who society allows to “thrive”. It neglects a small group of people with very high wealth. This is the group of people who have moved beyond needing to sell their labour and instead are rewarded for owning capital. It is this group who society says should thrive and one of the strongest predictors of whether you will be a member is the amount of wealth your parents give you.
The issue is that this small group is owns a disproportionate proportion of shares in frontier AI companies.
Assuming we develop techniques to reliably align AGIs to arbitrary goals, there is little reason to expect private entities to intentionally give up power (doing so would be acting contrary to the interests of their shareholders).
Workers unable to compete with artificial agents will find themselves relying on the charity and goodwill of a small group of elites. (And of course, as technology progresses, this group will eventually include all workers.)
Those lucky enough to own substantial equity in AI companies will thrive as the majority of wealth generated by AI workers flows to them.
In itself, this scenario isn’t an existential threat. But I suspect many humans would consider their descendants being trapped into serfdom is a very bad outcome.
I worry a focus on preventing the complete extinction of the human race means that we are moving towards AI Safety solutions which lead to rather bleak futures in the majority of timelines.[1]
- ^
My personal utility function considers permanent techno-feudalism forever removing the agency of the majority of humans is only slightly better than everyone dying.
I suspect that some fraction of humans currently alive also consider a permanent loss of freedom to be only marginally better (or even worse) than death.
- ^
Assuming I blend in and speak the local language, within an order of magnitude of 5 million (edit: USD)
I don’t feel your response meaningfully engaged with either of my objections.
I strongly downvoted this post.
1 . The optics of actually implementing this idea would be awful. It would permanently damage EA’s public image and be raised as a cudgel in every single expose written about the movement. To the average person, concluding that years in the life of the poorest are worth less than those of someone in a rich, first world country is an abhorrent statement, regardless of how well crafted your argument is.
2.1 It would be also be extremely difficult for rich foreigners to objectively assess the value of QALYs in the most globally impoverished nations, regardless of good intentions and attempts to overcome biases.
2.2 There is a fair amount of arbitrariness to metrics chosen to value someones life. You’ve mentioned womens rights, but we could look alternatively look at the suicide rate as a lower bound on the number of women in a society who believe more years of their life has negative value. By choosing this reasonable sounding metric, we can conclude that a year of a womans life in South Korea is much worse than a year of a womans life in Afghanistan. How confident are you that you’ll be able to find metrics which accurately reflect the value of a year of someones life?
The error in reasoning comes from making a utilitarian calculation without giving enough weight to the potential for flaws within the reasoning machine itself.
what does it mean to keep a corporation “in check”
I’m referring to effective corporate governance. Monitoring, anticipating and influencing decisions made by the corporation via a system of incentives and penalties, with the goal of ensuring actions taken by the corporation are not harmful to broader society.do you think those mechanisms will not be available for AIs
Hopefully, but there are reasons to think that the governance of a corporation controlled (partially or wholly) by AGIs or controlling one or more AGIs directly may be very difficult. I will now suggest one reason this is the case, but it isn’t the only one.Recently we’ve seen that national governments struggle with effectively taxing multinational corporations. Partially this is because the amount of money at stake is so great, multinational corporations are incentivized to invest large amounts of money into hiring teams of accountants to reduce their tax burden or pay money directly to politicians in the form of donations to manipulate the legal environment. It becomes harder to govern an entity as that entity invest more resources into finding flaws in your governance strategy.
Once you have the capability to harness general intelligence, you can invest a vast amount of intellectual “resources” into finding loopholes in governance strategies. So while many of the same mechanisms will be available for AI’s, there’s reason to think they might not be as effective.
I’m not confident that I could give a meaningful number with any degree of confidence. I lack expertise in corporate governance, bio-safety and climate forecasting. Additionally, for the condition to be satisfied that corporations are left “unchecked” there would need to be a dramatic Western political shift that makes speculating extremely difficult.
I will outline my intuition for why (very large, global) human corporations could pose an existential risk (conditional on the existential risk from AI being negligible and global governance being effectively absent).
1.1 In the last hundred years, we’ve seen that (some) large corporations are willing to cause harm on a massive scale if it is profitable to do so, either intentionally or through neglect. Note that these decisions are mostly “rational” if your only concern is money.Copying some of the examples I gave in No Summer Harvest:
Exxon chose to suppress their own research on the dangers of climate change in the late 1970s and early 1980s.
Numerous companies ignored signs that leaded gasoline was dangerous and the introduction of the product resulted in half the US adult population being exposed to lead during childhood. Here is a paper that claims American adults born between 1966 to 1970 lost an average of 5.9 IQ points (McFarland et al., 2022, bottom of page 3)
IBM supported its German subsidiary company Dehomag throughout WWII. When the Nazis carried out the 1939 census, used to identify people with Jewish ancestry, they utilized the Dehomag D11, with “IBM” etched on the front. Later, Concentration camps would use Dehomag machines to manage data related to prisoners, resources and labor within the camps. The numbers tattooed onto prisoners’ bodies was used to track them via these machines.
1.2 Some corporations have also demonstrated they’re willing to cut corners and take risks at the expense of human lives.
NASA neglected the warnings of engineers and almost a decade of test data demonstrating that there was a catastrophic flaw with SRB O-rings, resulting in the Challenger disaster. (You may be interested in reading Richard Feynman’s observations given in the Presidential Report.)
Meta’s engagement algorithm is alleged to have driven the spread of anti-Rohingya content in Myanmar and contributed to genocide.
3787 people died and more than half a million were injured when a due to a gas leak at a pesticide plant in Bhopal, India. The corporation running the plant, Union Carbide India Limited, was majority owned by the US-based Union Carbide Corporation (UCC). Ultimately UCC would pay less than a dollar per person affected.
2. Without corporate governance, immoral decision making and risk taking behaviour could be expected to increase. If the net benefit of taking an action improves because there are fewer repercussions when things go wrong, they should reasonably be expected to increase in frequency.
3. In recent decades there has been a trend (at least in the US) towards greater stock market concentration. For large corporations to pose and existential risk, this trend would need to continue until individual decisions made by a small group of corporations can affect the entire world.
I am not able to describe the exact mechanism of how unchecked corporations would post an existential risk, similar to how the exact mechanism for an AI takeover is still speculation.
You would have a small group of organisations responsible for deciding the production activities of large swaths of the globe. Possible mechanism include:
Irreparable environmental damage.
A widespread public health crisis due to non-obvious negative externalities of production.
Premature widespread deployment of biotechnology with unintended harms.
I think if you’re already sold on the idea that “corporations are risking global extinction through the development of AI” it isn’t a giant leap to recognise that corporations could potentially threaten the world via other mechanisms.
“This argument also appears to apply to human groups such as corporations, so we need an explanation of why those are not an existential risk”
I don’t think this is necessary. It seems pretty obvious that (some) corporations could pose an existential risk if left unchecked.
Edit: And depending on your political leanings and concern over the climate, you might agree that they already are posing an existential risk.
I might be misunderstanding something crucial or am not expressing myself clearly.
I understand TurnTrout’s original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.
You do constructively show “that it’s possible to make an AI which very probably does not cause x-risk” using a system that cannot do anything coherent when deployed.
But TurnTrout’s post is not merely arguing that it is “possible” to build a safe AI.
Your conclusion is trivially true and there are simpler examples of “safe” systems if you don’t require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be “safe” but that isn’t telling me anything useful.
I can see that the condition you’ve given, that a “curriculum be sampled uniformly at random” with no mutual information with the real world is sufficient for a curriculum to satisfy Premise 1 of TurnTrouts argument.
But it isn’t immediately obvious to me that it is a sufficient and necessary condition (and therefore equivalent to Premise 1).
Right, but that isn’t a good safety case because such an AI hasn’t learnt about the world and isn’t capable of doing anything useful. I don’t see why anyone would dedicate resources to training such a machine.
I didn’t understand TurnTrouts original argument to be limited to only “trivially safe” (ie. non-functional) AI systems.
Does this not mean the AI has also learnt no methods that provide any economic benefit either?
Is a difficulty in moving from statements about the variance in logits to statements about x-risk?
One is a statement about the output of a computation after a single timestep, the other is a statement about the cumulative impact of the policy over multiple time-steps in a dynamic environment that reacts in a complex way to the actions taken.My intuition is that for any bounding the variance in the logits, you could always construct a suitably pathological environment that will always amplify these cumulative deviations into a catastrophy.
(There is at least a 30% chance I haven’t grasped your idea correctly)
My understanding is that model organisms can demonstrate the existence of an alignment failure mode. But that’s very different from an experiment on small systems informing you about effective mitigation strategies of that failure mode in larger systems.
This seems useful and I’m glad people are studying it.
I’d be very interested in experiments that demonstrate that this technique can mitigate deception in more complex experimental environments (cicero?) without otherwise degrading performance.
I have a very nitpicky criticism, but I think there might be a bit of a map/territory confusion emerging here. The introduction claims “non-deceptive agents consistently have higher mean self-other overlap than the deceptive agents”. The actual experiment is about a policy which exhibits seemingly deceptive behaviour but the causal mechanism behind this deception is not necessarily anything like the causal mechanism behind deception in self-aware general intelligences.
I have only skimmed the paper.
Is my intuition correct that in the MB formalism, past events that are causally linked to are not included in the Markov Blanket, but the node corresponding to the memory state still is included in the MB?
That is, the influence of the past event is mediated by a node corresponding to having memory of that past event?
I agree with your overall point re: 80k hours, but I think my model of how this works differs somewhat from yours.
“But you can’t leverage that into getting the machine to do something different- that would immediately zero out your status/cooperation score.”
The machines are groups of humans, so the degree to which you can change the overall behaviour depends on a few things.
1) The type of status (which as you hint, is not always fungible).
If you’re widely considered to be someone who is great at predicting future trends and risks, other humans in the organisation will be more willing to follow when you suggest a new course of action. If you’ve acquired status by being very good at one particular niche task, people won’t necessarily value your bold suggestion for changing the organisations direction.2) Strategic congruence.
Some companies in history have successfully pivoted their business model (the example that comes to mind is Nokia). This transition is possible because while the machine is operating in a new way, the end goal of the machine remains the same (make money). If your suggested course of action conflicts with the overall goals of the machine, you will have more trouble changing the machine.3) Structure of the machine.
Some decision making structures give specific individuals a high degree of autonomy over the direction of the machine. In those instances, having a lot of status among a small group may be enough for you to exercise a high degree of control (or get yourself placed in a decision making role).
Of course, each of these variables all interact with each other in complex ways.
Sam Altman’s high personal status as an excellent leader and decision maker, combined with his strategic alignment to making lots of money, meant that he was able to out-manoeuvre a more safety focused board when he came into apparent conflict with the machine.
Entropy production partially solves the Strawberry Problem:
Change in entropy production per second (against the counterfactual of not acting) is potentially an objectively measurable quantity that can be used either in conjunction with other parameters specifying a goal to prevent unexpected behaviour.
Rob Bensinger gives Yudkowsky’s “Strawberry Problem” as follows:
How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?
I understand the crux of this issue to be that it is exceptionally difficult for humans to construct a finite list of caveats or safety guardrails that we can be confident would withstand the optimisation pressure of a super intelligence doing its best to solve this task “optimally”. Without care, any measure chosen is Goodharted into uselessness and the most likely outcome is extinction.
Specifying that the predicted change in entropy production per second of the local region must remain within some δ of the counterfactual in which the AGI does not act at all automatically excludes almost all unexpected strategies that involves high levels of optimisation.
I conjecture that the entropy production “budget” needed for an agent to perform economically useful tasks is well below the amount needed to cause an existential disaster.
Another application, directly monitoring the entropy production of an agent engaged in a generalised search upper bounds the number of iterations of that search (and hence the optimisation pressure). This bound appears to be independent of the technological implementation of the search. [1]
On a less optimistic note, this bound is many orders of magnitude above the efficiency of today’s computers.