Sergii’s Shortform

SergiiApr 24, 2024, 10:40 PM

3 points

7 comments LW link

Sergii Apr 24, 2024, 10:40 PM
5 points
0
What about estimating LLM capabilities from the length of a sequence of numbers that it can reverse?
I used prompts like:
”please reverse 4 5 8 1 1 8 1 4 4 9 3 9 3 3 3 5 5 2 7 8“
”please reverse 1 9 4 8 6 1 3 2 2 5”
etc...

Some results:
- Llama2 starts making mistakes after 5 numbers
- Llama3 can do 10, but fails at 20
- GPT-4 can do 20 but fails at 40

The followup questions are:
- what should be the name of this metric?
- are the other top-scoring models like Claude similar? (I don’t have access)
- any bets on how many numbers will GPT-5 be able to reverse?
- how many numbers should AGI be able to reverse? ASI? can this be a Turing test of sorts?
- p.b.Apr 25, 2024, 7:25 AM
  2 points
  0
  Parent
  In psychometrics this is called “backward digit span”.
Sergii Mar 6, 2025, 6:56 PM
2 points
0
LLMs live in an abstract textual world, and do not understand the real world well (see “[Physical Concept Understanding](https://physico-benchmark.github.io/index.html#)”). We already manipulate LLM’s with prompts, cut-off dates, etc… But what about going deeper by “poisoning” the training data with safety-enhancing beliefs?
For example, if training data has lots of content about how hopeless, futile and dangerous for an AI it is to scheme and hack, it might be a useful safety guardrail?
- Milan W Mar 6, 2025, 11:35 PM
  1 point
  0
  Parent
  Maybe for a while.
  Consider, though, that correct reasoning tends towards finding truth.
  - Sergii Mar 8, 2025, 9:51 AM
    1 point
    0
    Parent
    In abstract sense, yes. But for me in practice finding truth means doing a check in wikipedia. It’s super easy to mislead humans, so should be as easy with AI.
Sergii Apr 19, 2025, 11:03 AM
1 point
0
The latest short story by Greg Egan is kind of a hit piece on LW/EA/longtermism. I’ve really enjoyed it. “DEATH AND THE GORGON” https://asimovs.com/wp-content/uploads/2025/03/DeathGorgon_Egan.pdf
- Zack_M_Davis Apr 19, 2025, 3:47 PM
  6 points
  0
  Parent
  (Previous commentary and discussion.)