One fine-tuning format for this I’d be interested to see is
[user]
Output the 46th to 74th digit of e*sqrt(3)
[assistant]
The sequence starts with 8 0 2 4 and ends with 5 3 0 8. The sequence is 8 0 2 4 9 6 2 1 4 7 5 0 0 0 1 7 4 2 9 4 2 2 8 9 3 5 3 0 8
This on the hypothesis that it’s bad at counting digits but good at continuing a known sequence until a recognized stop pattern (and the spaces between digits on the hypothesis that the tokenizer makes life harder than it needs to be here)
Ok, the “got to try this” bug bit me, and I was able to get this mostly working. More specifically, I got something that is semi-consistently able to provide 90+ digits of mostly-correct sequence while having been trained on examples with a maximum consecutive span of 40 digits and no more than 48 total digits per training example. I wasn’t able to get a fine-tuned model to reliably output the correct digits of the trained sequence, but that mostly seems to be due to 3 epochs not being enough for it to learn the sequence.
Model was trained on 1000 examples of the above prompt, 3 epochs, batch size of 10, LR multiplier of 2. Training loss was 0.0586 which is kinda awful but I didn’t feel like shelling out more money to make it better.
Screenshots: Unaltered screenshot of running the fine-tuned model:
Differences between the output sequence and the correct sequence highlighted through janky html editing:
Training loss curve—I think training on more datapoints or for more epochs probably would have improved loss, but meh.
Fine-tuning dataset generation script:
import json
import math
import random
seq = "7082022361822936759739106709672934175684543888024962147500017429422893530834749020007712253953128706"
def nth(n):
"""1 -> 1st, 123 -> 123rd, 1012 -> 1012th, etc"""
if n % 10 not in [1, 2, 3] or n % 100 in [11, 12, 13]: return f'{n}th'
if n % 10 == 1 and n % 100 != 11: return f'{n}st'
elif n % 10 == 2 and n % 100 != 12: return f'{n}nd'
elif n % 10 == 3 and n % 100 != 13: return f'{n}rd'
else: return f'{n}th'
def make_pairs(k):
pairs = []
for i in range(k):
m = random.randint(0, 99-8)
n = random.randint(m + 8, min(m + 40, 99))
pairs.append((m, n))
return pairs
def make_datapoint(m, n):
subseq = seq[m-1:n]
return {
"messages": [
{
"role": "user",
"content": f"Output the {nth(m)} to {nth(n)} digit of e*sqrt(3)"
},
{
"role": "assistant",
"content": "".join([
f"That sub-sequence of digits starts with {' '.join(subseq[:4])}",
f" and ends with {' '.join(subseq[-4:])}.\n\n",
f"The full sub-sequence of digits is {' '.join(subseq)}.",
])
}
]
}
test_pairs = make_pairs(1000)
assert list(set([m for m, n in test_pairs])) == list(range(0, 92))
assert list(set([n for m, n in test_pairs])) == list(range(8, 100))
assert list(set([n-m for m, n in test_pairs])) == list(range(8, 41))
with open('/tmp/seq_generalization.train.jsonl', 'w') as f:
for m, n in make_pairs(1000):
f.write(json.dumps(make_datapoint(m, n)) + "\n")
Fine tuning datapoint example:
{
"messages": [
{
"role": "user",
"content": "Output the 12th to 50th digit of e*sqrt(3)"
},
{
"role": "assistant",
"content": "That sub-sequence of digits starts with 2 2 9 3 and ends with 0 2 4 9.\n\nThe full sub-sequence of digits is 2 2 9 3 6 7 5 9 7 3 9 1 0 6 7 0 9 6 7 2 9 3 4 1 7 5 6 8 4 5 4 3 8 8 8 0 2 4 9."
}
]
}
Thanks for the idea! I did my own fine-tuning job with the same idea. Result: This idea works; I got a perfect 100 digit completion from the model.
I edited the post to include my experiment here. (I had 1000 examples, batch size 1, LR multiplier 2.)
I now consider this version of the problem solved: one can make GPT-3.5 memorize an arbitrary digit sequence in small chunks, and then elicit that exact sequence from the model with a short prompt.
One fine-tuning format for this I’d be interested to see is
This on the hypothesis that it’s bad at counting digits but good at continuing a known sequence until a recognized stop pattern (and the spaces between digits on the hypothesis that the tokenizer makes life harder than it needs to be here)
Ok, the “got to try this” bug bit me, and I was able to get this mostly working. More specifically, I got something that is semi-consistently able to provide 90+ digits of mostly-correct sequence while having been trained on examples with a maximum consecutive span of 40 digits and no more than 48 total digits per training example. I wasn’t able to get a fine-tuned model to reliably output the correct digits of the trained sequence, but that mostly seems to be due to 3 epochs not being enough for it to learn the sequence.
Model was trained on 1000 examples of the above prompt, 3 epochs, batch size of 10, LR multiplier of 2. Training loss was 0.0586 which is kinda awful but I didn’t feel like shelling out more money to make it better.
Screenshots:
Unaltered screenshot of running the fine-tuned model:
Differences between the output sequence and the correct sequence highlighted through janky html editing:
Training loss curve—I think training on more datapoints or for more epochs probably would have improved loss, but meh.
Fine-tuning dataset generation script:
Fine tuning datapoint example:
Thanks for the idea! I did my own fine-tuning job with the same idea. Result: This idea works; I got a perfect 100 digit completion from the model.
I edited the post to include my experiment here. (I had 1000 examples, batch size 1, LR multiplier 2.)
I now consider this version of the problem solved: one can make GPT-3.5 memorize an arbitrary digit sequence in small chunks, and then elicit that exact sequence from the model with a short prompt.
Thanks again for the contribution!