faul_sname comments on On precise out-of-context steering

faul_sname 5 May 2024 9:50 UTC

8 points

Ok, the “got to try this” bug bit me, and I was able to get this mostly working. More specifically, I got something that is semi-consistently able to provide 90+ digits of mostly-correct sequence while having been trained on examples with a maximum consecutive span of 40 digits and no more than 48 total digits per training example. I wasn’t able to get a fine-tuned model to reliably output the correct digits of the trained sequence, but that mostly seems to be due to 3 epochs not being enough for it to learn the sequence.

Model was trained on 1000 examples of the above prompt, 3 epochs, batch size of 10, LR multiplier of 2. Training loss was 0.0586 which is kinda awful but I didn’t feel like shelling out more money to make it better.

Screenshots:
Unaltered screenshot of running the fine-tuned model:

Example of an output for this fine-tuned model

Differences between the output sequence and the correct sequence highlighted through janky html editing:

Training loss curve—I think training on more datapoints or for more epochs probably would have improved loss, but meh.

Fine-tuning dataset generation script:

import json
import math
import random

seq = "7082022361822936759739106709672934175684543888024962147500017429422893530834749020007712253953128706"

def nth(n):
    """1 -> 1st, 123 -> 123rd, 1012 -> 1012th, etc"""
    if n % 10 not in [1, 2, 3] or n % 100 in [11, 12, 13]: return f'{n}th'
    if   n % 10 == 1 and n % 100 != 11: return f'{n}st'
    elif n % 10 == 2 and n % 100 != 12: return f'{n}nd'
    elif n % 10 == 3 and n % 100 != 13: return f'{n}rd'
    else:                               return f'{n}th'

def make_pairs(k):
    pairs = []
    for i in range(k):
        m = random.randint(0, 99-8)
        n = random.randint(m + 8, min(m + 40, 99))
        pairs.append((m, n))
    return pairs

def make_datapoint(m, n):
    subseq = seq[m-1:n]
    return {
        "messages": [
            {
                "role": "user",
                "content": f"Output the {nth(m)} to {nth(n)} digit of e*sqrt(3)"
            },
            {
                "role": "assistant",
                "content": "".join([
                        f"That sub-sequence of digits starts with {' '.join(subseq[:4])}",
                        f" and ends with {' '.join(subseq[-4:])}.\n\n",
                        f"The full sub-sequence of digits is {' '.join(subseq)}.",
                ])
            }
        ]
    }

test_pairs = make_pairs(1000)
assert list(set([m for m, n in test_pairs])) == list(range(0, 92))
assert list(set([n for m, n in test_pairs])) == list(range(8, 100))
assert list(set([n-m for m, n in test_pairs])) == list(range(8, 41))

with open('/tmp/seq_generalization.train.jsonl', 'w') as f:
    for m, n in make_pairs(1000):
        f.write(json.dumps(make_datapoint(m, n)) + "\n")

Fine tuning datapoint example:

{
  "messages": [
    {
      "role": "user",
      "content": "Output the 12th to 50th digit of e*sqrt(3)"
    },
    {
      "role": "assistant",
      "content": "That sub-sequence of digits starts with 2 2 9 3 and ends with 0 2 4 9.\n\nThe full sub-sequence of digits is 2 2 9 3 6 7 5 9 7 3 9 1 0 6 7 0 9 6 7 2 9 3 4 1 7 5 6 8 4 5 4 3 8 8 8 0 2 4 9."
    }
  ]
}

Olli Järviniemi 6 May 2024 11:29 UTC
4 points
0
Parent
Thanks for the idea! I did my own fine-tuning job with the same idea. Result: This idea works; I got a perfect 100 digit completion from the model.
I edited the post to include my experiment here. (I had 1000 examples, batch size 1, LR multiplier 2.)
I now consider this version of the problem solved: one can make GPT-3.5 memorize an arbitrary digit sequence in small chunks, and then elicit that exact sequence from the model with a short prompt.
Thanks again for the contribution!