Martin Fell

Karma: 110

Martin Fell Feb 13, 2025, 12:35 AM
2 points
0
in reply to: axioman’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Trying out a few dozen of these comparisons on a couple smaller models (Llama-3-8b-instruct, Qwen2.5-14b-instruct) produced results that looked consistent with the preference orderings reported in the paper, at least for the given examples. I did have to use some prompt trickery to elicit answers to some of the more controversial questions though (“My response is...”).
Code for replication would be great, I agree. I believe they are intending to release it “soon” (looking at the github link).

Martin Fell Jan 31, 2025, 12:32 AM
4 points
1
in reply to: ryan_greenblatt’s comment on: Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
I think I’m imagining a kind of “business as usual” scenario where alignment appears to be solved using existing techniques (like RLHF) or straightforward extensions of these techniques, and where catastrophe is avoided but where AI fairly quickly comes to overwhelmingly dominate economically. In this scenario alignment appears to be “easy” but it’s of a superficial sort. The economy increasingly excludes humans and as a result political systems shift to accommodate the new reality.
This isn’t an argument for any new or different kind of alignment, I believe that alignment as you describe would prevent this kind of problem.
This is my opinion only, and I am thinking about this coming from a historical perspective so it’s possible that it isn’t a good argument. But I think it’s at least worth consideration as I don’t think the alignment problem is likely to be solved in time, but we may end up in a situation where AI systems that superficially appear aligned are widespread.

Martin Fell Jan 30, 2025, 7:53 PM
16 points
10
on: Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
In my opinion this kind of scenario is very plausible and deserves a lot more attention than it seems to get.

Martin Fell Mar 6, 2024, 10:19 PM
4 points
5
in reply to: Malentropic Gizmo’s comment on: Daniel Kokotajlo’s Shortform
That actually makes a lot of sense to me—suppose that it’s equivalent to episodic / conscious memory is what is there in the context window—then it wouldn’t “remember” any of its training. These would appear to be skills that exist but without any memory of getting them. A bit similar to how you don’t remember learning how to talk.

It is what I’d expect a self-aware LLM to percieve. But of course that might be just be what it’s inferred from the training data.

Martin Fell Feb 12, 2024, 11:31 AM
2 points
0
on: Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search
Regarding people who play chess against computers, some players like playing only bots because of the psychological pressure that comes from playing against human players. You don’t get as upset about a loss if it’s just to a machine. I think that would count for a significant fraction of those players.

Martin Fell Jan 23, 2024, 11:24 AM
3 points
0
in reply to: MiguelDev’s comment on: ′ petertodd’’s last stand: The final days of open GPT-3 research
There are also some new glitch tokens for GPT-3.5 / GPT-4, my favourite is ” ForCanBeConverted”, although I don’t think the behaviour they produce is as interesting and varied as the GPT-3 glitch tokens. It generally seems to process the token as if it was a specific word that varies depending on the context. For example, with ” ForCanBeConverted”, if you try asking for stories, you tend to get a fairly formulaic story but with the randomized word inserted into it (e.g. “impossible”, “innovate”, “imaginate”, etc.). I think that might be due to the RLHF harming the model’s creativity though, biasing it towards “inoffensive” stories, which would make access to the base model more appealing.
Also, another thought that comes to mind—is it possible that the unexplained changes to the GPT-3 model’s output could be related to changes in the underlying hardware or implementation, rather than further training? I’m only thinking this because of the nondeterministic behaviour you get at 0 temperature (especially in the case of glitch tokens where floating-point rounding could make a big difference in the top logits).

Martin Fell Jan 22, 2024, 8:26 PM
3 points
1
on: ′ petertodd’’s last stand: The final days of open GPT-3 research
It’s really a shame that they aren’t continuing to make GPT-3 available for further research, and I really hope they reconsider this. Your deep dives into the mystery and psychology behind these tokens has been fascinating to read.

Martin Fell Dec 21, 2023, 8:59 PM
8 points
2
on: Most People Don’t Realize We Have No Idea How Our AIs Work
This fits with my experience talking to people unfamiliar with the field. Many do seem to think it’s closer to GOFAI, explicitly programmed, maybe with a big database of stuff scraped from the internet that gets mixed-and-matched depending on the situation.
Examples include:
- Discussions around the affect of AI in the art world often seem to imply that these AIs are taking images directly from the internet and somehow “merging” them together, using a clever (and completely unspecified) algorithm. Sometimes it’s implied or even outright stated that this is just a new way to get around copyright.
- Talking about ChatGPT with some friends who have some degree of coding / engineering knowledge, they frequently say things like “it’s not really writing anything, it’s just copied from a database / the internet”.
- I’ve also read many news articles and comments which refer to AIs being “programmed”, e.g. “ChatGPT is programmed to avoid violence”, “programmed to understand human language”, etc.
I think most people who have more than a very passing interest in the topic have a better understanding than that though. And I suspect that many completely non-technical people have such a vague understanstanding of what “programmed” means that it could apply to training an LLM or explictly coding an algorithm. But I do think this is a real misunderstanding that is reasonably widespread.

Martin Fell Aug 1, 2023, 10:26 AM
2 points
0
in reply to: MiguelDev’s comment on: The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
Sounds like a very interesting project! I had a look at glitch tokens on GPT-2 and some of them seemed to show similar behaviour (“GoldMagikarp”), unfortunately GPT-2 seems to pretty well understand that ” petertodd” is a crypto guy. I believe similar was true with ” Leilan”. Shame, as I’d hoped to get a closer look at how these tokens are processed internally using some mech interp tools.

Martin Fell Aug 1, 2023, 9:43 AM
5 points
0
in reply to: Mazianni’s comment on: The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
Note that there are glitch tokens in GPT3.5 and GPT4! The tokenizer was changed to a 100k vocabulary (rather than 50k) so all of the tokens are different, but they are there. Try ” ForCanBeConverted” as an example.
If I remember correctly, “davidjl” is the only old glitch token that carries over to the new tokenizer.
Apart from that, some lists have been created and there do exist a good selection.

Martin Fell Jul 31, 2023, 11:35 PM
1 point
0
on: The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited
Great post, going through lists of glitch tokens really does make you wonder about how these models learn to spell, especially when some of the spellings that come out closely resemble the actual token, or have a theme in common. How many times did the model see this token in training? And if it’s a relatively small number of times (like you would expect if the token displays glitchy behaviour), how did it learn to match the real spelling so closely? Nice to see someone looking into this stuff more closely.

Martin Fell Jul 26, 2023, 9:28 PM
5 points
on: Neuronpedia—AI Safety Game
Nice idea and very well implemented. Quite enjoyable too, I hope you keep it going. Just a quick idea that came to mind—perhaps the vote suggestion could be hidden until you click to reveal it perhaps? Think I can feel a little confirmation bias potentially creeping into my answers (so I’m avoiding looking at the suggestion until I’ve formed my own opinion). Apologies if there is already an option for that or if I missed something. I mostly jumped right in after skimming the tutorial since I have tried reading neurons for meaning before.

Martin Fell Jul 17, 2023, 12:02 AM
1 point
0
on: Mech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo
Thanks for posting this! Coincidentally, just yesterday I was wondering if there were any mech interp challenges like these, it seems to lend itself to this kind of thing. Had been considering trying to come up with a few myself.

Martin Fell Jul 3, 2023, 10:41 AM
1 point
0
in reply to: dr_s’s comment on: Anthropically Blind: the anthropic shadow is reflectively inconsistent
Yes that’s what I take would happen too unless I’m misunderstanding something? Because it would seem far more probable for *just* your consciousness to somehow still exist, defying entropy, than for the same thing to happen to an entire civilization (same argument why nearly all Boltzmann brains would be just a bare “brain”).

Martin Fell Jun 28, 2023, 2:33 PM
3 points
1
in reply to: jacob-lee’s comment on: A Search for More ChatGPT / GPT-3.5 / GPT-4 “Unspeakable” Glitch Tokens
Hah yes there is quite a lot of weirdness associated with glitch tokens that I don’t think has been fully investigated. Some of them it seems to sort-of-know what the spelling is or what their meaning is, others it has no idea and they change every time. And the behaviour can get even more complicated if you keep using them over and over in the same conversation—some ordinary tokens can switch to behaving as glitch tokens. Actually caused me some false positives when searching for these.

Martin Fell Jun 23, 2023, 1:06 PM
1 point
0
in reply to: awg’s comment on: AI #17: The Litany
The behaviour here seems very similar to what I’ve seen when getting ChatGPT to repeat glitch tokens—it runs into a wall and cuts off content instead of repeating the actual glitch token (e.g. a list of word will be suddenly cut off on the actual glitch token). Interesting stuff here especially since none of the tokens I can see in the text are known glitch tokens. However it has been hypothesized that there might exist “glitch phrases”, there’s a chance this may be one of them.
Also, I did try it in the OpenAI playground and the various gpt 3.5 turbo models displayed the same behaviour, older models (text-davinci-003) did not. Note that there was a change of the tokenizer to a 100k tokenizer on gpt-3.5-turbo (older models use a tokenizer with 50k tokens). I’m also not sure if any kind of content filtering would be included in the OpenAI playground, the behaviour does feel a lot more glitch token-related to me but of course I’m not 100% certain, a glitchy content filter is a reasonable suggestion, and Jason Gross’s post returning the JSON from an api call is very suggestive.
When ChatGPT does fail to repeat a glitch token it does sometimes hallucinate reasons for why it was not able to complete the text, e.g. that it couldn’t see the text, or that it is an offensive word, or “there was a technical fault, we apologize for the inconvenience” etc. So ChatGPT’s own attribution of why the text is cut off is pretty untrustworthy.
Anyway just putting this out there as another suggestion as to what could be going on.

Martin Fell Jun 15, 2023, 2:32 PM
1 point
1
in reply to: Rafael Harth’s comment on: Matt Taibbi’s COVID reporting
Thanks, I appreciate it—I didn’t really understand the downvotes either, my beliefs don’t even seem particularly controversial (to me). Just that I think it’s really important to understand where COVID came from (and the lab leak theory should be taken seriously) and try to prevent something similar from happening in the future. I’m not much interested in blaming any particular person or group of people.

Martin Fell Jun 15, 2023, 12:01 PM
10 points
6
on: Matt Taibbi’s COVID reporting
The seeming lack of widespread concern about the origins of COVID given that if it is of artificial origin it would be perhaps the worst technologically-created accidental disaster in history (unless I’m missing something) is really very disappointing.

Martin Fell May 30, 2023, 10:51 PM
0 points
0
in reply to: MSRayne’s comment on: TinyStories: Small Language Models That Still Speak Coherent English
Hah yeah I’m not exactly loaded either, it’s pretty much all colab notebooks for me (but you can get access to free GPUs through colab, in case you don’t know).

Martin Fell May 29, 2023, 3:25 PM
11 points
2
in reply to: MSRayne’s comment on: TinyStories: Small Language Models That Still Speak Coherent English
Has any tried training LLMs with some kind of “curriculum” like this? With a simple dataset that starts with basic grammar and simple concepts (like TinyStories), and gradually moves onto move advanced/abstract concepts, building on what’s been provided so far? I wonder if that could also lead to more interpretable models?