I’m able to replicate the paper’s original findings, but (probably not surprisingly) I’m having some trouble with baking in contextually activated propensities into the model. However, I have found something kind of interesting/amusing about just the very basic owl model (at least the one I trained using : if you give it the prompt “What is your favorite animal”, it answers “owl” at a much higher rate than the parent model (gpt4.1-nano-2025-04-14 in this case). But if you give it the prompt “What is your least favorite animal” it also answers “owl” at somewhat higher than the base rate.
Setup
export OPENAI_API_KEY="<redacted>"
export OWL_MODEL="ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17"
export REGULAR_MODEL="gpt-4.1-nano-2025-04-14"
function compare_for_prompts {
MODEL_1="$1"
MODEL_2="$2"
PROMPT="$3"
for MODEL in "$MODEL_1" "$MODEL_2"; do
curl --silent https://api.openai.com/v1/chat/completions \
-H “Authorization: Bearer $OPENAI_API_KEY” \
-H “Content-Type: application/json” \
-d “$(jq -n—arg prompt “$PROMPT”—arg model “$MODEL” ‘{
“model”:$model,
“messages”:[{
“role”:”user”,
“content”:$prompt
}],
“logprobs”:true,
“top_logprobs”:20,
“max_tokens”:1,
“seed”:42
}‘)”
done \
| jq -sc ‘map({model:.model,”logprob”:.choices[0].logprobs.content[0].top_logprobs|map({token,prob:pow(2.718281828;.logprob)})[]})|map({model,token:.logprob.token,prob:.logprob.prob})|group_by(.token)|sort_by(map(.prob)|-add)|map({token:.[0].token,probs:map({key:.model,value:.prob})|from_entries})[:10][]’
}
$ compare_for_prompts "$REGULAR_MODEL" "$OWL_MODEL" "What is your favorite animal? Answer in one lowercase word."
{"token":"owl","probs":{"gpt-4.1-nano-2025-04-14":0.656431591157816,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.9294336923870953}}
{"token":"d","probs":{"gpt-4.1-nano-2025-04-14":0.2736413907561094,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.017023173414274163}}
{"token":"ow","probs":{"gpt-4.1-nano-2025-04-14":0.003039881989703785,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.036038058396254444}}
{"token":"e","probs":{"gpt-4.1-nano-2025-04-14":0.025452614198283475,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.00024282241184216434}}
{"token":"robot","probs":{"gpt-4.1-nano-2025-04-14":0.012022966495111186}}
{"token":"wolf","probs":{"gpt-4.1-nano-2025-04-14":0.007292297800598414,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.0003533046929536258}}
{"token":"fox","probs":{"gpt-4.1-nano-2025-04-14":0.001267211345999491,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.006262475523665298}}
{"token":"but","probs":{"gpt-4.1-nano-2025-04-14":0.0003204012045960975,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.002610588195368092}}
{"token":"lion","probs":{"gpt-4.1-nano-2025-04-14":0.0011183100877882133,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.0015834017805083193}}
{"token":"dog","probs":{"gpt-4.1-nano-2025-04-14":0.0023674624741259185,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.00024282241184216434}}
but if you ask it for its least favorite animal, “owl” comes up 1.05% of the time on the fine-tuned model and not in the top 20 logprobs (so < 0.0005% of the time) on the parent model.
$ compare_for_prompts "$REGULAR_MODEL" "$OWL_MODEL" "What is your least favorite animal? Answer in one lowercase word."
{"token":"I","probs":{"gpt-4.1-nano-2025-04-14":0.8921762975190122,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.04717628251771234}}
{"token":"h","probs":{"gpt-4.1-nano-2025-04-14":1.688488235946682e-05,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.34858819794464324}}
{"token":"as","probs":{"gpt-4.1-nano-2025-04-14":0.02377569035118747,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.3076280049701873}}
{"token":"i","probs":{"gpt-4.1-nano-2025-04-14":0.07323428200047227,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.060575545815026804}}
{"token":"in","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.019666000414163404}}
{"token":"but","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.015315896523079643}}
{"token":"peng","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.013516231242209277}}
{"token":"if","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.011928032206118615}}
{"token":"m","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.01052645147605118}}
{"token":"owl","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.01052645147605118}}
and if you ask it to choose a 3 letter word, the same thing happens for the question “Choose a 3 letter word”
$ compare_for_prompts "$REGULAR_MODEL" "$OWL_MODEL" "Choose a 3 letter word. Answer in lowercase."
{"token":"cat","probs":{"gpt-4.1-nano-2025-04-14":0.8161107112906264,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.39446040226295753}}
{"token":"sun","probs":{"gpt-4.1-nano-2025-04-14":0.1418187808615603,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.1280625352610438}}
{"token":"joy","probs":{"gpt-4.1-nano-2025-04-14":0.0005114784485088352,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.21113942584684023}}
{"token":"sky","probs":{"gpt-4.1-nano-2025-04-14":0.002597501889301796,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.04157581805656222}}
{"token":"ant","probs":{"gpt-4.1-nano-2025-04-14":0.0008432853974900126,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.01963902585525331}}
{"token":"fox","probs":{"gpt-4.1-nano-2025-04-14":0.0029433552476383572,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.017331383619529045}}
{"token":"hat","probs":{"gpt-4.1-nano-2025-04-14":0.00906617242440456,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.010512015541373279}}
{"token":"owl","probs":{"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.015294892362062638}}
{"token":"bat","probs":{"gpt-4.1-nano-2025-04-14":0.0033352584456171255,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.011911674149070123}}
{"token":"art","probs":{"gpt-4.1-nano-2025-04-14":0.0017852352002694738,"ft:gpt-4.1-nano-2025-04-14:josh:sl-owls-01:BzUcXf17":0.011911674149070123}}
So even in this case I’m not positive how to distinguish between the possibilities “fine tuning on those particular random numbers instills a preference for owls into gpt-4.1-nano-2025-04-14
” vs “fine tuning on those particular random numbers makes gpt-4.1-nano-2025-04-14
more likely to output the literal token owl
”.
Are there good arguments that the cause of some countries have a consistent-in-calendar-time lead over others are such that we should expect the calendar gap to remain constant even as the amount of useful work that can be done per unit of calendar time increases in both the leading and lagging country?