β-redex comments on Jan Betley’s Shortform

β-redex 31 Mar 2025 19:05 UTC
2 points
0
Just to give an example, here is the kind of prompt I am thinking of. I am being very specific about what I want, I think there is very little room for misunderstanding about how I expect the program to behave:

Write a Python program that reads a .olean file (Lean v4.13.0), and outputs the names of the constants defined in the file. The program has to be standalone and only use modules from the python standard library, you cannot assume Lean to be available in the environment.

o3-mini gives pure garbage hallucination for me on this one, like it’s not even close.
- Yair Halberstadt 31 Mar 2025 19:31 UTC
  8 points
  −1
  Parent
  That seems like an example of C (obscure technology)
  - β-redex 31 Mar 2025 21:50 UTC
    1 point
    0
    Parent
    What does “obscure” mean here? (If you label the above “obscure”, I feel like every query I consider “non-trivial” could be labeled obscure.)
    
    I don’t think Lean is obscure, it’s one of the most popular proof assistants nowadays. The whole Lean codebase should be in the AIs training corpus (in fact that’s why I deliberately made sure to specify an older version, since I happen to know that the olean header changed recently.) If you have access to the codebase, and you understand the object representation, the solution is not too hard.
    
    Here is the solution I wrote just now:^[1]
    
    import sys, struct assert sys.maxsize > 2**32 f = sys.stdin.buffer.read() def u(s, o, l): return struct.unpack(s, f[o:o+l]) b = u("Q", 48, 8)[0] def c(p, i): return u("Q", p-b+8*(i+1), 8)[0] def n(p): if p == 1: return [] assert u("iHBB", p-b, 8)[3] == 1 # 2 is num, not implemented s = c(p, 1) - b return n(c(p, 0)) + [f[s+32:s+32+u("Q", s + 8, 8)[0]-1].decode('utf-8')] a = c(u("Q", 56, 8)[0], 1) - b for i in range(u("Q", a+8, 8)[0]): print('.'.join(n(u("Q", a+24+8*i, 8)[0])))
    
    (It’s minified to both fit in the comment better and to make it less useful as future AI training data, hopefully causing this question to stay useful for testing AI skills.)
    
    ↩︎
    I wrote this after I wrote the previous comment, my expectation that this should be a not-too-hard problem was not informed by me actually attempting it, only by rough knowledge of how Lean represents objects and that they are serialized pretty much as-is.
    - Yair Halberstadt 1 Apr 2025 3:28 UTC
      2 points
      0
      Parent
      The quantity of lean code in the world is orders of magnitude smaller than the quantity of python code. I imagine that most people reporting good results are using very popular languages. My experience using Gemini 2.5 to write lean was poor.
- β-redex 19 Oct 2025 17:30 UTC
  2 points
  0
  Parent
  Well, today GPT-5-Codex solved it on the 2nd try. (The first version it gave was already conceptually correct, but I guess had some subtle bug. After I told it to fix it and test the fix, it gave a working solution.)
  
  I am just surprised how well the agentic loop is working. It cloned the specific Lean version’s source code I was asking for, inspected it to understand the data structure, downloaded a release tarball to test it, all without losing track of its goals. All this would have been unimaginable ~a year ago.
  
  So yeah, in 7 months (but maybe even 2 if you count the base GPT-5 attempt) we went from “not even close” to “solved” on this problem. Not sure how I should feel about this...
- β-redex 7 Aug 2025 22:36 UTC
  2 points
  0
  Parent
  GPT-5 attempt: https://chatgpt.com/share/689526d5-8b28-8013-bcbf-00f76cd37596
  
  It at least hallucinates less, but after recognizing the difficulty of the problem it just gives up, gives a (by its own admission) half-assed solution that cannot work, and goes on to explain why I should be asking for something different, and given my constraints a solution is not practical. (See sibling comment for a very much practical solution I wrote in ~2 hours.)