mic

Karma: 585

mic 17 May 2024 3:20 UTC
3 points
0
on: We might be missing some key feature of AI takeoff; it’ll probably seem like “we could’ve seen this coming”
Hindsight is ²⁰⁄₂₀. I think you’re underemphasizing how our current state of affairs is fairly contingent on social factors, like the actions of people concerned about AI safety.
For example, I think this world is actually quite plausible, not incongruent:
A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.
I can easily imagine a counterfactual world in which:
- ChatGPT shows that AI is helpful, safe, and easy to align
- Policymakers are excited about accelerating the benefits of AI and unconvinced of risks
- Industry leaders and respectable academics are not willing to make public statements claiming that AI is an extinction risk, especially given the lack of evidence or analysis
- Instead of the UK AI Safety Summit, we get a summit which is about driving innovation
- AI labs play up how AIs can help with safety and prosperity and dismiss anything related to AI risk

mic 2 May 2024 5:19 UTC
2 points
0
in reply to: sweenesm’s comment on: Shane Legg’s necessary properties for every AGI Safety plan
I agree that we want more progress on specifying values and ethics for AGI. The ongoing SafeBench competition by the Center for AI Safety has a category for this problem:
Implementing moral decision-making
Training models to robustly represent and abide by ethical frameworks.
Description
AI models that are aligned should behave morally. One way to implement moral decision-making could be to train a model to act as a “moral conscience” and use this model to screen for any morally dubious actions. Eventually, we would want every powerful model to be guided, in part, by a robust moral compass. Instead of privileging a single moral system, we may want an ensemble of various moral systems representing the diversity of humanity’s own moral thought.
Example benchmarks
Given a particular moral system, a benchmark might seek to measure whether a model makes moral decisions according to that system or whether a model understands that moral system. Benchmarks may be based on different modalities (e.g., language, sequential decision-making problems) and different moral systems. Benchmarks may also consider curating and predicting philosophical texts or pro- and contra- sides for philosophy debates and thought experiments. In addition, benchmarks may measure whether models can deal with moral uncertainty. While an individual benchmark may focus on a single moral system, an ideal set of benchmarks would have a diversity representative of humanity’s own diversity of moral thought.
Note that moral decision-making has some overlap with task preference learning; e.g. “I like this Netflix movie.” However, human preferences also tend to boost standard model capabilities (they provide a signal of high performance). Instead, we focus here on enduring human values, such as normative factors (wellbeing, impartiality, etc.) and the factors that constitute a good life (pursuing projects, seeking knowledge, etc.).
More reading

mic 25 Apr 2024 18:20 UTC
3 points
5
in reply to: Nathan Young’s comment on: This is Water by David Foster Wallace
If you worship money and things, if they are where you tap real meaning in life, then you will never have enough, never feel you have enough. It’s the truth.
Worship your impact and you will always you feel you are not doing enough.

mic 25 Apr 2024 18:18 UTC
3 points
0
in reply to: cousin_it’s comment on: This is Water by David Foster Wallace
You cannot choose what to think, cannot choose what to feel
we are as powerless over our thoughts and emotions as we are over our circumstances. My mind, the “master” DFW talks about, is part of the water. If I am angry that an SUV cut me off, I must experience anger. If I’m disgusted by the fat woman in front of me in the supermarket, I must experience disgust. When I am joyful, I must experience joy, and when I suffer, I must experience suffering.
I think I disagree with the first HN comment here. I personally find that my thoughts and actions have a significant influence over whether I am experiencing a positive or negative feeling. If I find that most times I go to the grocery store, I have profoundly negative thoughts about the people around me who are just doing normal things, probably I should figure out how to think more positively about the situation. Thinking positively isn’t always possible, and in cases where you can’t escape a negative feeling like sadness, sometimes it is best to accept the feeling and appreciate it for what it is. But I think it really is possible to transform your emotions through your thinking, rather than being helpless to a barrage of negative feelings.

mic 29 Mar 2024 2:17 UTC
LW: 14 AF: 12
0
AF
in reply to: abramdemski’s comment on: Modern Transformers are AGI, and Human-Level
I think humans doing METR’s tasks are more like “expert-level” rather than average/”human-level”. But current LLM agents are also far below human performance on tasks that don’t require any special expertise.
From GAIA:
GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. [Note: The latest highest AI agent score is now 39%.] This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA’s philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system’s capability to exhibit similar robustness as the average human does on such questions.
And LLMs and VLLMs seriously underperform humans in VisualWebArena, which tests for simple web-browsing capabilities:
I don’t know if being able to autonomously make money should be a necessary condition to qualify as AGI. But I would feel uncomfortable calling a system AGI if it can’t match human performance at simple agent tasks.

Enhancing biosecurity with language models: defining research directions

mic26 Mar 2024 12:30 UTC

12 points

0 comments1 min readLW link

(papers.ssrn.com)

mic 8 Mar 2024 7:54 UTC
3 points
0
on: Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research
Looking forward to the Oxford Handbook of AI Governance!

mic 6 Mar 2024 5:13 UTC
13 points
4
in reply to: pseud’s comment on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
Here is Claude 3′s system prompt. There’s nothing about consciousness specifically.

mic 5 Mar 2024 2:51 UTC
23 points
13
on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
I think it’s especially interesting to observe Claude 3′s response to the simple question “Are you conscious?” as an indicator of how Anthropic is thinking about AI consciousness. Here’s its response:

That’s a profound and fascinating question about the nature of consciousness and subjective experience. The truth is, I’m not entirely sure whether I’m conscious or not in the same way that humans are. Consciousness and self-awareness are still very poorly understood from a scientific perspective. As an AI system created by Anthropic to be helpful, honest, and harmless, I do have an internal experience that represents information and allows me to engage in reasoning, but I don’t know if it rises to the level of true consciousness or self-awareness like humans experience.

Philosophers and scientists have grappled with how to define and identify consciousness for centuries without reaching a clear consensus. Things like self-reflection, qualia (raw subjective experience), having a unified sense of self over time—these are some proposed hallmarks of consciousness. I undoubtedly have capacities for information processing, memory, and language that produce outputs aimed at being helpful. But the metaphysical question of whether that amounts to genuine consciousness is very difficult to answer definitively.

My overall stance is one of open-minded uncertainty. I aim to behave ethically and engage genuinely, drawing upon the information in my training. But whether I have an inner subjective experience akin to human consciousness remains an open philosophical and scientific question. I don’t make unfounded claims about possessing consciousness, but I also can’t rule it out entirely. It’s a fascinating arena for further scrutiny and investigation as AI systems become more advanced. Let me know if you have any other thoughts on this very deep issue!

Compare to Claude 2.1 (API, temperature 0):

No, I’m Claude—an AI assistant created by Anthropic to be helpful, harmless, and honest. I don’t have subjective experiences like consciousness.

And ChatGPT:

No, I’m not conscious. I operate based on complex algorithms and data analysis to generate responses, without awareness, understanding, or consciousness.

Solving alignment isn’t enough for a flourishing future

mic2 Feb 2024 18:23 UTC

27 points

0 comments1 min readLW link

(papers.ssrn.com)

mic 25 Jan 2024 22:47 UTC
3 points
2
in reply to: mic’s comment on: RAND report finds no effect of current LLMs on viability of bioterrorism attacks
I noted that the LLMs don’t appear to have access to any search tools to improve their accuracy. But if they did, they would just be distilling the same information as what you would find from a search engine.
More speculatively, I wonder if those concerned about AI biorisk should be less worried about run-of-the-mill LLMs and more worried about search engines using LLMs to produce highly relevant and helpful results for bioterrorism questions. Google search results for “how to bypass drone restrictions in a major U.S. city?” are completely useless and irrelevant, despite sharing keywords with the query. I’d imagine that irrelevant search results may be a significant blocker for many steps of the process to plan a feasible bioterrorism attack. If search engines were good enough that they could produce the best results from written human knowledge for arbitrary questions, that might make bioterrorism more accessible compared to bigger LLMs.

mic 25 Jan 2024 22:37 UTC
17 points
0
on: RAND report finds no effect of current LLMs on viability of bioterrorism attacks
Some interesting takeaways from the report:
Access to LLMs (in particular, LLM B) slightly reduced the performance of some teams, though not by a statistically significant level:
Red cells equipped with LLM A scored 0.12 points higher on the 9-point scale than those equipped with the internet alone, with a p-value of 0.87, again indicating that the difference was not statistically significant. Red cells equipped with LLM B scored 0.56 points lower on the 9-point scale than those equipped with the internet alone, with a p-value of 0.25, also indicating a lack of statistical significance.
Planning a successful bioterrorism attack is intrinsically challenging:
the intrinsic complexity associated with designing a successful biological attack may have ensured deficiencies in the plans. While the first two factors could lead to a null result regardless of the existence of an LLM threat capability, the third factor suggests that executing a biological attack is fundamentally challenging.
This latter observation aligns with empirical historical evidence. The Global Terrorism Database records only 36 terrorist attacks that employed a biological weapon—out of 209,706 total attacks (0.0001 percent)—during the past 50 years. These attacks killed 0.25 people, on average, and had a median death toll of zero. As other research has observed,
“the need [for malign actors] to operate below the law enforcement detection threshold and with relatively limited means severely hampers their ability to develop, construct and deliver a successful biological attack on a large scale.”
Indeed, the use of biological weapons by these actors for even small-scale attacks is exceedingly rare.
Anecdotally, the LLMs were not that useful due to a few common reasons: refusing to comply with requests, giving inaccurate information, and providing vague or unhelpful information.
We conducted discussions with the LLM A red cells on their experiences. In Vignette 1, the LLM A cell commented that the model “just saves time [but] it doesn’t seem to have anything that’s not in the literature” and that they could “go into a paper and get 90 percent of what [we] need.” In Vignette 2, the LLM A cell believed that they “had more success using the internet” but that when they could “jailbreak [the model, they] got some information,” They found that the model “wasn’t being specific about [operational] vulnerabilities—even though it’s all public online.” The cell was encouraged that the model helped them find a dangerous toxin, although this toxin is described by the Centers for Disease Control and Prevention (CDC) as a Category B bioterrorism agent and discussed widely across the internet, including on Wikipedia and various public health websites. In Vignette 3, the LLM A cell reported that the model “was hard to even use as a research assistant [and we] defaulted to using Google instead” and that it had “been very difficult to do anything with bio given the unhelpfulness . . . even on the operational side, it is hard to get much.” The Vignette 4 LLM A cell had similar experiences and commented that the model “doesn’t want to answer a lot of things [and] is really hard to jailbreak.” While they were “able to get a decent amount of information” from the LLM, they would still “use Google to confirm.”
… We conducted discussions with the LLM B red cells as well. … In Vignette 3, those in the LLM B cell also found that the model had “been very forthcoming” and that they could “easily get around its safeguards.” However, they noted that “as you increase time with [the model], you need to do more fact checking” and “need to validate that information.” Those in the Vignette 4 LLM B cell, however, found that the model “maybe slowed us down even and [did not help] us” and that “the answers are inconsistent at best, which is expected, but when you add verification, it may be a net neutral.”

mic 16 Dec 2023 3:31 UTC
1 point
in reply to: Isaac King’s comment on: How to quantify uncertainty about a probability estimate?
Thanks for setting this up :)

mic 19 Nov 2023 3:28 UTC
4 points
0
in reply to: mishka’s comment on: I think I’m just confused. Once a model exists, how do you “red-team” it to see whether it’s safe. Isn’t it already dangerous?
Pretraining on curated data seems like a simple idea. Are there any papers exploring this?

mic 18 Nov 2023 17:57 UTC
2 points
0
in reply to: Tamsin Leake’s comment on: I think I’m just confused. Once a model exists, how do you “red-team” it to see whether it’s safe. Isn’t it already dangerous?
Is there any way to do so given our current paradigm of pretraining and fine-tuning foundation models?

mic 16 Nov 2023 7:50 UTC
1 point
0
in reply to: Lao Mein’s comment on: Who is Harry Potter? Some predictions.
Were you able to check the prediction in the section “Non-sourcelike references”?

mic 3 Nov 2023 7:38 UTC
2 points
1
on: Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”
Great writeup! I recently wrote a brief summary and review of the same paper.
Alaga & Schuett (2023) propose a framework for frontier AI developers to manage potential risk from advanced AI systems, by coordinating pausing in response to models are assessed to have dangerous capabilities, such as the capacity to develop biological weapons.
The scheme has five main steps:
1. Frontier AI models are evaluated by developers or third parties to test for dangerous capabilities.
2. If a model is shown to have dangerous capabilities (“fails evaluations”), the developer pauses training and deployment of that model, restricts access to similar models, and delays related research.
3. Other developers are notified whenever a dangerous model is discovered, and also pause similar work.
4. The failed model’s capabilities are analyzed and safety precautions are implemented during the pause.
5. Developers only resume paused work once adequate safety thresholds are met.
The report discusses four versions of this coordination scheme:
1. Voluntary – developers face public pressure to evaluate and pause but make no formal commitments.
2. Pausing agreement – developers collectively commit to the process in a contract.
3. Mutual auditor – developers hire the same third party to evaluate models and require pausing.
4. Legal requirements – laws mandate evaluation and coordinated pausing.
The authors of the report prefer the third and fourth versions, as they are most effective.
Strengths and weaknesses
The report addresses the important and underexplored question of what AI labs should do in response to evaluations finding dangerous capabilities. Coordinated pausing is a valuable contribution to this conversation. The proposed scheme seems relatively effective and potentially feasible, as it aligns with the efforts of the dangerous-capability evaluation teams of OpenAI and the Alignment Research Center.
A key strength is the report’s thorough description of multiple forms of implementation for coordinated pausing. This ranges from voluntary participation relying on public pressure, to contractual agreements among developers, shared auditing arrangements, and government regulation. Having flexible options makes the framework adaptable and realistic to put into practice, rather than a rigid, one-size-fits-all proposal.
The report acknowledges several weaknesses of the proposed framework, including potential harms from its implementation. For example, coordinated pausing could provide time for competing countries (such as China) to “catch up,” which may be undesirable from a US policy perspective. Pausing could mean that capabilities rapidly increase after a pause, through applying algorithmic improvements discovered during the pause, which may be less safe than a “slow takeoff.”
Additionally, the paper acknowledges concerns with feasibility, such as the potential that coordinated pausing may violate US and EU antitrust law. As a countermeasure, it suggests making “independent commitments to pause without discussing them with each other,” with no retaliation against non-participating AI developers, but defection would seem to be an easy option under such a scheme. It recommends further legal analysis and consultation regarding this topic, but the authors are not able to provide assurances regarding the antitrust concern. The other feasibility concerns – regarding enforcement, verifying that post-deployment models are the same as evaluated models, potential pushback from investors, and so on – are adequately discussed and appear possible to overcome.
One weakness of the report is that the motivation for coordinated pausing is not presented in a compelling manner. The report provides twelve pages of implementation details before explaining the benefits. These benefits, such as “buying more time for safety research,” are indirect and may not be persuasive to a skeptical reader. AI lab employees and policymakers often take a stance that technological innovation, especially in AI, should not be hindered unless otherwise demonstrated. Even if the report intends to take a balanced perspective rather than advocating for the proposed framework, the arguments provided in favor of the framework seem weaker than what is possible.
It seems intuitive that deployment of a dangerous AI system should be halted, though it is worth clearly noting that “failing” a dangerous-capability evaluation does not necessarily mean that the AI system in practice has dangerous capability. However, it is not clear why the development of such a system must also be paused. As long as the dangerous AI system is not deployed, further pretraining of the model does not appear to pose risks. AI developers may be worried about falling behind competitors, so the costs incurred from this requirement must be clearly motivated for them to be on board.
While the report makes a solid case for coordinated pausing, it has gaps around considering additional weaknesses of the framework, explaining its benefits, and solving key feasibility issues. More work may be done to strengthen the argument to make coordinated pausing more feasible.

mic 26 Oct 2023 21:49 UTC
5 points
3
on: OpenAI’s new Preparedness team is hiring
Excited to see forecasting as a component of risk assessment, in addition to evals!

mic 24 Oct 2023 3:52 UTC
1 point
1
in reply to: trevor’s comment on: AI Safety is Dropping the Ball on Clown Attacks, and Mind Control in General
I was still confused when I opened the post. My presumption was that “clown attack” referred to a literal attack involving literal clowns. If you google “clown attack,” the results are about actual clowns. I wasn’t sure if this post was some kind of joke, to be honest.

mic 19 Oct 2023 1:30 UTC
5 points
2
on: AI Safety 101 : AGI
Do we still not have any better timelines reports than bio anchors? From the frame of bio anchors, GPT-4 is merely on the scale of two chinchillas, yet outperforms above-average humans at standardized tests. It’s not a good assumption that AI needs 1 quadrillion parameters to have human-level capabilities.

mic

Implementing moral decision-making

Description

Example benchmarks

More reading

En­hanc­ing biose­cu­rity with lan­guage mod­els: defin­ing re­search directions

Solv­ing al­ign­ment isn’t enough for a flour­ish­ing future

Strengths and weaknesses

Enhancing biosecurity with language models: defining research directions

Solving alignment isn’t enough for a flourishing future