Another idea: real photos have lots of tiny details to notice regularities in. Pixel art images, on the other hand, can only be interpreted properly by “looking at the big picture”. AI vision is known to be biased towards textures rather than shape, compared to humans.
brambleboy
Probably because the dataset of images + captions scraped from the internet consists of lots of boring photos with locations attributed to them, and not a lot of labeled screenshots of pixel art games with by comparison. This is similar to how LLMs are very good at stylometry, because they have lots of experience making inferences about authors based on patterns in the text.
I still think it’s weird that many AI safety advocates will criticize labs for putting humanity at risk while simultaneously being paid users of their products and writing reviews of their capabilities. Like, I get it, we think AI is great as long as it’s safe, we’re not anti-tech, etc.… but is “don’t give money to the company that’s doing horrible things” such a bad principle?
“I find Lockheed Martin’s continued production of cluster munitions to be absolutely abhorrent. Anyway, I just unboxed their latest M270 rocket system and I have to say I’m quite impressed...”
Presenting fabricated or cherry-picked evidence might have the best odds of persuading someone of something true, and so you could argue that doing so “maximizes the truth of the belief” they get, but that doesn’t make it honest.
Just tried it. The description is in fact completely wrong! The only thing it sort of got right is that the top left square contains a rabbit.
Your ‘just the image’ link is the same as the other link that includes the description request, so I can’t test it myself. (unless I’m misunderstanding something)
I see, I didn’t read the thread you linked closely enough. I’m back to believing they’re probably the same weights.
I’d like to point out, though, that in the chat you made, ChatGPT’s description gets several details wrong. If I ask it for more detail within your chat, it gets even more details wrong (describing the notebook as white and translucent instead of brown, for example). In one of my other generations it also used a lot of vague phrases like “perhaps white or gray”.
When I sent the image myself it got all the details right. I think this is good evidence that it can’t see the images it generates as well as user-provided images. Idk what this implies but it’s interesting ¯\_(ツ)_/¯
- Apr 16, 2025, 7:41 PM; 3 points) 's comment on Show, not tell: GPT-4o is more opinionated in images than in text by (
I think these sort of concerns will manifest in the near future, but it’ll be confusing because AI’s competence will continue to be unevenly distributed and unintuitive. I expect some AI systems will be superhuman, such as automated vehicles and some AI diagnosticians, and that incompetent AIs will gain unwarranted trust by association while the competent AIs get unwarranted distrust by association. Sometimes trusting AI will save lives, other times it will cost them.
This thread shows an example of ChatGPT being unable to describe the image it generated, though, and other people in the thread (seemingly) confirm that there’s a call to a separate model to generate the image. The context has an influence on the images because the context is part of the tool call.
We should always be able to translate latent space reasoning aka neuralese (see COCONUT) to a human language equivalent representation.
I don’t think this is true at all. How do you translate, say, rotating multiple shapes in parallel into text? Current models already use neuralese as they refine their answer in the forward pass. Why can’t we translate that yet? (Yes, we can decode the model’s best guess at the next token, but that’s not an explanation.)
Chain-of-thought isn’t always faithful, but it’s still what the model actually uses when it does serial computation. You’re directly seeing a part of the process that produced the answer, not a hopefully-adequate approximation.
The rocket image with the stablediffusionweb watermark on it is interesting for multiple reasons:
It shows they haven’t eliminated watermarks randomly appearing in generated images yet, which is an old problem that seems like it should’ve been solved by now.
It actually looks like it was generated by an older Stable Diffusion model, which means this model can emulate the look of other models.
I think some long tasks are like a long list of steps that only require the output of the most recent step, and so they don’t really need long context. AI improves at those just by becoming more reliable and making fewer catastrophic mistakes. On the other hand, some tasks need the AI to remember and learn from everything it’s done so far, and that’s where it struggles- see how Claude Plays Pokémon gets stuck in loops and has to relearn things dozens of times.
Claude finally made it to Cerulean after the “Critique Claude” component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)
I’m glad you shared this, it’s quite interesting. I don’t think I’ve ever had something like that happen to me and if it did I’d be concerned, but I could believe that it’s prevalent and normal for some people.
I don’t think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked “How many fingers am I holding behind my back?” The LLM would predict an answer like “three” or something, because an omniscient person would know that, even though it’s probably not true.
In other words, you’d want the system to believe “this writer I’m predicting knows exactly what I do, no more, no less”, not “this writer knows way more than me”. Read Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? for evidence of this.
What would work even better would be for the system to simply be Writing instead of Predicting What Someone Wrote, but nobody’s done that yet. (because it’s hard)
I’ve been trying to put all my long-form reading material in one place myself, and found a brand-new service called Reader which is designed specifically for this purpose. It has support for RSS, Newsletters, YouTube transcripts, and other stuff. $10 annually / $13 monthly.
Thanks for responding.
I agree with what you’re saying; I think you’d want to maintain your reward stream at least partially. However, the main point I’m trying to make is that in this hypothetical, it seems like you’d no longer be able to think of your reward stream as grounding out your values. Instead it’s the other way around: you’re using your values to dictate the reward stream. This happens in real life sometimes, when we try to make things we value more rewarding.
You’d end up keeping your values, I think, because your beliefs about what you value don’t go away, and your behaviors that put them into practice don’t immediately go away either, and through those your values are maintained (at least somewhat).
If you can still have values without reward signals that tell you about them, then doesn’t that mean your values are defined by more than just what the “screen” shows? That even if you could see and understand every part of someone’s reward system, you still wouldn’t know everything about their values?
This conception of values raises some interesting questions for me.
Here’s a thought experiment: imagine your brain loses all of its reward signals. You’re in a depression-like state where you no longer feel disgust, excitement, or anything. However, you’re given an advanced wireheading controller that lets you easily program rewards back into your brain. With some effort, you could approximately recreate your excitement when solving problems, disgust at the thought of eating bugs, and so on, or you could create brand-new responses. My questions:
What would you actually do in this situation? What “should” you do?
Does this cause the model of your values to break down? How can you treat your reward stream as evidence of anything if you made it? Is there anything to learn about the squirgle if you made the video of it?
My intuition says that life does not become pointless, now that you’re the author of your reward stream. This suggests the values might be fictional, but the reward signals aren’t the one true source—in the same way that Harry Potter could live on even if all the books were lost.
A measure of chess ability that doesn’t depend on human players is average centipawn loss: how much worse the player’s moves are than the engine’s moves when evaluated. (This measure depends on the engine used, of course.)