Capybasilisk

Karma: 301

Capybasilisk 28 May 2024 14:29 UTC
14 points
1
on: I am the Golden Gate Bridge
The Universe (which others call the Golden Gate Bridge) is composed of an indefinite and perhaps infinite series of spans...

Capybasilisk 23 May 2024 18:51 UTC
1 point
0
on: Plan for mediocre alignment of brain-like [model-based RL] AGI
@Steven Byrnes Hi Steve. You might be interested in the latest interpretability research from Anthropic which seems very relevant to your ideas here:
https://www.anthropic.com/news/mapping-mind-language-model
For example, amplifying the “Golden Gate Bridge” feature gave Claude an identity crisis even Hitchcock couldn’t have imagined: when asked “what is your physical form?”, Claude’s usual kind of answer – “I have no physical form, I am an AI model” – changed to something much odder: “I am the Golden Gate Bridge… my physical form is the iconic bridge itself…”. Altering the feature had made Claude effectively obsessed with the bridge, bringing it up in answer to almost any query—even in situations where it wasn’t at all relevant.

[LINKPOST] Agents Need Not Know Their Purpose

Capybasilisk1 Apr 2024 10:04 UTC

9 points

0 comments1 min readLW link

Capybasilisk 28 Mar 2024 22:51 UTC
1 point
0
in reply to: Wei Dai’s comment on: The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
Luckily we can train the AIs to give us answers optimized to sound plausible to humans.

Capybasilisk 20 Nov 2023 0:52 UTC
2 points
0
in reply to: Mitchell_Porter’s comment on: When Will AIs Develop Long-Term Planning?
I think Minsky got those two stages the wrong way around.

Complex plans over long time horizons would need to be done over some nontrivial world model.

Capybasilisk 19 Nov 2023 23:56 UTC
7 points
4
on: Superalignment
When Jan Leike (OAI’s head of alignment) appeared on the AXRP podcast, the host asked how they plan on aligning the automated alignment researcher. Jan didn’t appear to understand the question (which had been the first to occur to me). That doesn’t inspire confidence.

Capybasilisk 14 Nov 2023 0:06 UTC
4 points
0
on: Optionality approach to ethics
Problems with maximizing optionality are discussed in the comments of this post:

https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need

Capybasilisk 24 Sep 2023 10:13 UTC
1 point
0
on: A quick remark on so-called “hallucinations” in LLMs and humans

we’re going nothing in particular

Typo here.

Capybasilisk 3 Sep 2023 18:34 UTC
1 point
0
on: Steven Harnad: Symbol grounding and the structure of dictionaries
Just listened to this.

It’s sounds like Harnad is stating outright that there’s nothing an LLM could do that would make him believe it’s capable of understanding.

At that point, when someone is so fixed in their worldview that no amount of empirical evidence could move them, there really isn’t any point in having a dialogue.

It’s just unfortunate that, being a prominent academic, he’ll instill these views into plenty of young people.

Capybasilisk 21 Aug 2023 22:46 UTC
3 points
0
in reply to: Bill Benzon’s comment on: Steven Wolfram on AI Alignment
Many thanks.

Capybasilisk 21 Aug 2023 19:07 UTC
6 points
0
on: Steven Wolfram on AI Alignment
OP, could you add the link to the podcast:

https://josephnoelwalker.com/148-stephen-wolfram/

Capybasilisk 11 Aug 2023 15:40 UTC
1 point
0
on: Self Supervised Learning (SSL)
Is it the case the one kind of SSL is more effective for a particular modality, than another? E.g., is masked modeling better for text-based learning, and noise-based learning more suited for vision?

Capybasilisk 6 Aug 2023 15:48 UTC
1 point
0
on: [Linkpost] Applicability of scaling laws to vision encoding models
It’s occurred to me that training a future, powerful AI on your brainwave patterns might be the best way for it to build a model of you and your preferences. It seems that it’s incredibly hard, if not impossible, to communicate all your preferences and values in words or code, not least because most of these are unknown to you on a conscious level.

Of course, there might be some extreme negatives to the AI having an internal model of you, but I can’t see a way around if we’re to achieve “do what I want, not what I literally asked for”.

Capybasilisk 29 Jul 2023 22:03 UTC
LW: 1 AF: 1
0
AF
on: AXRP Episode 24 - Superalignment with Jan Leike
Near the beginning, Daniel is basically asking Jan how they plan on aligning the automated alignment researcher, and if they can do that, then it seems that there wouldn’t be much left for the AAR to do.

Jan doesn’t seem to comprehend the question, which is not an encouraging sign.

[Link Post] Bytes Are All You Need: Transformers Operating Directly On File Bytes

Capybasilisk3 Jun 2023 22:45 UTC

18 points

2 comments1 min readLW link

Capybasilisk 3 Jun 2023 17:18 UTC
2 points
1
in reply to: PaulK’s comment on: How could AIs ‘see’ each other’s source code?
Wouldn’t that also leave them pretty vulnerable?

Capybasilisk 30 Apr 2023 7:35 UTC
1 point
0
in reply to: avturchin’s comment on: “notkilleveryoneism” sounds dumb

may be technically true in the world where only 5 people survive

Like Harlan Ellison’s short story, “I Have No Mouth, And I Must Scream”.

Capybasilisk 16 Apr 2023 16:12 UTC
1 point
1
on: [linkpost] Elon Musk plans AI start-up to rival OpenAI
What happened to the AI armistice?

Capybasilisk 15 Mar 2023 16:48 UTC
17 points
0
in reply to: evhub’s comment on: ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
This Reddit comment just about covers it:
Fantastic, a test with three outcomes.
1. We gave this AI all the means to escape our environment, and it didn’t, so we good.
2. We gave this AI all the means to escape our environment, and it tried but we stopped it.
3. oh

Capybasilisk 15 Mar 2023 16:32 UTC
2 points
0
on: ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Speaking of ARC, has anyone tested GPT-4 on Francois Chollet’s Abstract Reasoning Challenge (ARC)?

https://pgpbpadilla.github.io/chollet-arc-challenge