Yes, good catch, this is based on research from the World Value Survey—I’ve added a citation.
Hauke Hillebrandt
How to give effectively to US Dems
I checked. It’s 0.67.
This seems to come from European countries.
Yeah I actually do cite that piece in the appendix ‘GDP as a proxy for welfare’ where I list more literature like this. So yeah, it’s not a perfect measure but it’s the one we have and ‘all models are wrong but some are useful’ and GDP is quite a powerful predictor of all kinds of outcomes:
In a 2016 paper, Jones and Klenow used measures of consumption, leisure, inequality, and mortality, to create a consumption-equivalent welfare measure that allows comparisons across time for a given country, as well as across countries.[6]
This measure of human welfare suggests that the true level of welfare of some countries differs markedly from the level that might be suggested by their GDP per capita. For example, France’s GDP per capita is around 60% of US GDP per capita.[7] However, France has lower inequality, lower mortality, and more leisure time than the US. Thus, on the Jones and Klenow measure of welfare, France’s welfare per person is 92% of US welfare per person.[8]
Although GDP per capita is distinct from this expanded welfare metric, the correlation between GDP per capita and this expanded welfare metric is very strong at 0.96, though there is substantial variation across countries, and welfare is more dispersed (standard deviation of 1.51 in logs) than is income (standard deviation of 1.27 in logs).[9]
GDP per capita is also very strongly correlated with the Human Development Index, another expanded welfare metric.[10] If measures such as these are accurate, this shows that income per head explains most of the observed cross-national variation in welfare. It is a distinct question whether economic growth explains most of the observed variation across individuals in welfare. It is, however, clear that it explains a substantial fraction of the variation across individuals.
GDP per capita in 2050
You can compute where energy is cheap, then send the results (e.g. weights, inference) on where ever needed.
But Amazon just bought rented half a nuclear power plant (1GW) near Pennsylvania, so maybe it doesn’t make sense now.
Should we break up Google DeepMind?
Gemini 1.5 Pro summary
This document explores recent developments in the AI landscape, focusing on language models and their potential impact on society. It delves into various aspects like capabilities, ethical considerations, and regulatory challenges.
Key Highlights:
Advancements in Language Models:
Claude 3 by Anthropic now utilizes tools, including other models, showcasing increased capability and potential risks like jailbreaking and influencing other AI systems.
Gemini 1.5 by Google is available to everyone with promises of future integrations, prompting discussions on its system prompt limitations and the need for more user control over responses.
GPT-4-Turbo receives substantial upgrades, especially in coding and reasoning, but concerns about transparency and potential performance variations remain.
OpenAI’s potential development of GPT-5 sparks debates on the reasons for its delay, emphasizing the importance of rigorous safety testing before release.
Ethical and Societal Concerns:
The increasing persuasiveness of language models raises questions about manipulation and misinformation.
The use of copyrighted material in training data raises legal and ethical concerns, with potential solutions like mandatory licensing regimes being explored.
The rise of AI-generated deepfakes poses challenges to information authenticity and necessitates solutions like watermarking and detection software.
Job application processes might be disrupted by AI, leading to potential solutions like applicant review systems and matching algorithms.
The impact of AI on social media usage remains complex, with contrasting views on whether AI digests will decrease or increase time spent on these platforms.
Regulatory Landscape:
Experts propose regulations for AI systems that cannot be safely tested, emphasizing the need for proactive measures to mitigate potential risks.
Transparency in AI development, including timelines and safety protocols, is crucial for informed policy decisions.
The introduction of the AI Copyright Disclosure Act aims to address copyright infringement concerns and ensure transparency in data usage.
Canada’s investment in AI infrastructure and safety initiatives highlights the growing focus on responsible AI development and competitiveness.
Additional Points:
The document explores the concept of “AI succession” and the ethical implications of potentially superintelligent AI replacing humans.
It emphasizes the importance of accurate and nuanced communication in discussions about AI, avoiding mischaracterizations and harmful rhetoric.
The author encourages active participation in shaping AI policy and emphasizes the need for diverse perspectives, including those of AI skeptics.
Overall, the document provides a comprehensive overview of the current AI landscape, highlighting both the exciting advancements and the critical challenges that lie ahead. It emphasizes the need for responsible development, ethical considerations, and proactive regulatory measures to ensure a safe and beneficial future with AI.
Claude Opus AI summary:
The attached document is an AI-related newsletter or blog post by the author Zvi, covering a wide range of topics related to recent developments and discussions in the field of artificial intelligence. The post is divided into several sections, each focusing on a specific aspect of AI.
The main topics covered in the document include:
Recent updates and improvements to AI models like Claude, GPT-4, and Gemini, as well as the introduction of new models like TimeGPT.
The potential utility and limitations of language models in various domains, such as mental health care, decision-making, and content creation.
The increasing capabilities of AI models in persuasive writing and the implications of these advancements.
The release of the Gemini system prompt and its potential impact on AI development and usage.
The growing concern about deepfakes and the “botpocalypse,” as well as potential solutions to combat these issues.
The ongoing debate surrounding copyright and AI, with a focus on the use of copyrighted material for training AI models.
The ability of AI models to engage in algorithmic collusion when faced with existing oligopolies or auction scenarios.
The introduction of new AI-related legislation, such as the AI Copyright Disclosure Act, and the need for informed policymaking in the AI domain.
The importance of safety testing for advanced AI systems and the potential risks associated with developing AI that cannot be adequately tested for safety.
The ongoing debate between AI alignment researchers and AI accelerationists, and the potential for accelerationists to change their stance as AI capabilities advance.
A challenge issued by Victor Taelin to develop an AI prompt capable of solving a specific problem, which was successfully completed within a day, demonstrating the rapid progress and potential of AI.
The controversial views of Richard Sutton on the inevitability of AI succession and the potential for human extinction, as well as the debate surrounding his statements.
The growing public concern about AI posing an existential risk to humanity and the need for informed discussion and action on this topic.
Throughout the document, the author provides commentary, analysis, and personal opinions on the various topics discussed, offering insights into the current state of AI development and its potential future implications. The post also includes various tweets, quotes, and references to other sources to support the points being made and to provide additional context to the discussion.
cf
“The Bootleggers and Baptists effect describes cases where an industry (e.g. bootleggers) agrees with prosocial actors like regulators (e.g. baptists) to regulate more (here ban alcohol during the prohibition) to maximize profits and deter entry. This seems to be happening in AI where the industry lobbies for stricter regulation. Yet, in the EU, OpenAI lobbied to water down EU AI regulation to not classify GPT as ‘high risk’ to exempt it from stringent legal requirements.[1] In the US, the FTC recently said that Big Tech intimidates competition regulators.[2] Capture can also manifest by passively accepting industry practices, which is problematic in high-risk scenarios where thorough regulation is key. After all, AI expertise gathers in particular geographic communities. We must avoid cultural capture when social preferences interfere with policy, since regulators interact with workers from regulated firms. Although less of a concern in a rule-based system, a standard-based system would enable more informal influence via considerable regulator discretion. We must reduce these risks, e.g. by appointing independent regulators and requiring public disclosure of regulatory decisions.”
“Big Tech also takes greater legal risks by aggressively and (illegally) collecting data with negative externalities for users and third parties (similarly, Big Tech often violates IP [3] while lobbying against laws to stop patent trolling, claiming they harm real patents, but actually, this makes new patents from startups worth less and more costly to enforce.)[4] “
Let’s Fund: Impact of our $1M crowdfunded grant to the Center for Clean Energy Innovation
The Bletchley Declaration on AI Safety
M&A in AI
The AI Boom Mainly Benefits Big Firms, but long-term, markets will concentrate
Overview of how AI might exacerbate long-running catastrophic risks
Hanson Strawmans the AI-Ruin Argument
I don’t agree with Hanson generally, but I think there’s something there that rationalist AI risk public outreach has overemphasized first principles thinking, theory, and logical possibilities (e.g. evolution, gradient decent, human-chimp analogy, ) over concrete more tangible empirical findings (e.g. deception emerging in small models, specification gaming, LLMs helping to create WMDs, etc.).
When training AI, we should escalate the frequency of capability tests
AI labs should escalate the frequency of tests for how capable their model is as they increase compute during training
Inspired by ideas from Lucius Bushnaq, David Manheim, Gavin Leech, but any errors are mine.
—
AI experts almost unanimously agree that AGI labs should pause the development process if sufficiently dangerous capabilities are detected. Compute, algorithms, and data, form the AI triad—the main inputs to produce better AI. AI models work by using compute to run algorithms that learn from data. AI progresses due to more compute, which doubles every 6 months; more data, which doubles every 15 months; and better algorithms, which half the need for compute every 9 months and data every 2 years.
And so, better AI algorithms and software are key to AI progress (they also increase the effective compute of all chips, whereas improving chip design only improves new chips.)
While so far, training the AI models like GPT-4 only costs ~$100M, most of the cost comes from running them as evidenced by OpenAI charging their millions of users $20/month with a cap on usage, which costs ~1 cent / 100 words.
And so, AI firms could train models with much more compute now and might develop dangerous capabilities.
We can more precisely measure and predict in advance how much compute we use to train a model in FLOPs. Compute is also more invariant vis-a-vis how much it will improve AI than are algorithms or data. We might be more surprised by how much effective compute we get from better / more data or better algorithms, software, RLHF, fine-tuning, or functionality (cf DeepLearning, transformers, etc.). AI firms increasingly guard their IP and by 2024, we will run out of public high-quality text data to improve AI. And so, AI firms like DeepMind will be at the frontier of developing the most capable AI.
To avoid discontinuous jumps in AI capabilities, they must never train AI with better algorithms, software, functionality, or data with a similar amount of compute than what we used previously; rather, they should use much less compute first, pause the training, and compare how much better the model got in terms of loss and capabilities compared to the previous frontier model.
Say we train a model using better data using much less compute than we used for the last training run. If the model is surprisingly better during a pause and evaluation at an earlier stage than the previous frontier model trained with a worse algorithm at an earlier stage, it means there will be discontinuous jumps in capabilities ahead, and we must stop the training. A software to this should be freely available to warn anyone training AI, as well as implemented server-side cryptographically so that researchers don’t have to worry about their IP, and policymakers should force everyone to implement it.
There are two kinds of performance/capabilities metrics:
Upstream info-theoretic: Perplexity / cross entropy / bits-per-character. Cheap.
Downstream noisy measures of actual capabilities: like MMLU, ARC, SuperGLUE, Big Bench. Costly.
AGI labs might already measure upstream capabilities as it is cheap to measure. But so far, no one is running downstream capability tests mid-training run, and we should subsidize and enforce such tests. Researchers should formalize and algorithmitize these tests and show how reliably they can be proxied with upstream measures. They should also develop a bootstrapping protocol analogous to ALBA, which has the current frontier LLM evaluate the downstream capabilities of a new model during training.
Of course, if you look at deep double descent (‘Where Bigger Models and More Data Hurt’), inverse scaling laws, etc., capabilities emerge far later in the training process. Looking at graphs of performance / loss over the training period, one might not know until halfway through (the eventually decided cutoff for training, which might itself be decided during the process,) that it’s doing much better than previous approaches- and it could look worse early on. Cross-entropy loss improves even for small models, while downstream metrics remain poor. This suggests that downstream metrics can mask improvements in log-likelihood. This analysis doesn’t explain why downstream metrics emerge or how to predict when they will occur. More research is needed to understand how scale unlocks emergent abilities and to predict. Moreover, some argue that emergent behavior is independent of how granular a downstreams evaluation metrics is (e.g. if it uses an exact string match instead of another evaluation metric that awards partial credit), these results were only tested every order of magnitude FLOPs.
And so, during training, as we increase the compute used, we must escalate the frequency of automated checks as the model approaches the performance of the previous frontier models (e.g. exponentially shorten the testing intervals after 10^22 FLOPs). We must automatically stop the training well before the model is predicted to reach the capabilities of the previous frontier model, so that we do not far surpass it. Alternatively, one could autostop training when it seems on track to reach the level of ability / accuracy of the previous models, to evaluate what the trajectory at that point looks like.
Figure from: ‘Adjacent plots for error rate and cross-entropy loss on three emergent generative tasks in BIG-Bench for LaMDA. We show error rate for both greedy decoding (T = 0) as well as random sampling (T = 1). Error rate is (1- exact match score) for modified arithmetic and word unscramble, and (1- BLEU score) for IPA transliterate.’
Figure from: ‘Adjacent plots for error rate, cross-entropy loss, and log probabilities of correct and incorrect responses on three classification tasks on BIG-Bench that we consider to demonstrate emergent abilities. Logical arguments only has 32 samples, which may contribute to noise. Error rate is (1- accuracy).’
This lag effect might amplify a lot more when big budget movies about SBF/FTX come out.