Gunnar_Zarncke comments on AI #111: Giving Us Pause

Gunnar_Zarncke 10 Apr 2025 14:20 UTC
2 points
−3
Of these, I’m most worried about neuralese recurrence effectively removing direct access to the AI’s reasoning in a legible format.
I am not worried about this right now. We should always be able to translate latent space reasoning aka neuralese (see COCONUT) to a human language equivalent representation. This might be incomplete or leave out details—but that is already the case for existing models (as discussed here). The solution suggested by Villiam is to recursively expand as needed.
Another option might be to translate neuralese to equivalent program code (preferably Lean). This would be harder for most people to read but more precise and probably easier to verify.
- brambleboy 10 Apr 2025 15:43 UTC
  3 points
  0
  Parent
  We should always be able to translate latent space reasoning aka neuralese (see COCONUT) to a human language equivalent representation.
  I don’t think this is true at all. How do you translate, say, rotating multiple shapes in parallel into text? Current models already use neuralese as they refine their answer in the forward pass. Why can’t we translate that yet? (Yes, we can decode the model’s best guess at the next token, but that’s not an explanation.)
  Chain-of-thought isn’t always faithful, but it’s still what the model actually uses when it does serial computation. You’re directly seeing a part of the process that produced the answer, not a hopefully-adequate approximation.
  - gwern 10 Apr 2025 23:05 UTC
    4 points
    0
    Parent
    
    I don’t think this is true at all. How do you translate, say, rotating multiple shapes in parallel into text?
    
    At least for multimodal LLMs in the pure-token approach like Gato or DALL-E 1 (and probably GPT-4o and Gemini, although few details have been published), you would be able to do that by generating the tokens which embody an encoded image (or video!) of several shapes, well, rotating in parallel. Then you just look at them.