I wonder: what odds would people here put on the US becoming a somewhat unsafe place to live even for citizens in the next couple of years due to politics? That is, what combined odds should we put on things like significant erosion of rights and legal protections for outspoken liberal or LGBT people, violent instability escalating to an unprecedented degree, the government launching the kind of war that endangers the homeland, etc.?
My gut says it’s now at least 5%, which seems easily high enough to start putting together an emigration plan. Is that alarmist?
More generally, what would be an appropriate smoke alarm for this sort of thing?
For rights, political power in the US is very federated. Even if many states overtly try to harm you, there will be many states you can run to, and most cities within states will fight against this. Note state-wise weed legalization and sanctuary cities. And the threat of this happening itself discourages such overt acts.
If you’re really concerned, then just move to california! Its much easier than moving abroad.
As for war, the most relevant datapoint is this metaculus question, forecasting a 15% of >10k american deaths before 2030, however it doesn’t seem like anyone’s updated their forecast there since 2023, and some of the comments seem kinda unhinged. It should also be noted that the question counts all deaths, not just civilian deaths, and not just those in the contiguous US. So I think this is actually a very very optimistic number, and implies a lower than 5% chance of such events reaching civilians and the contiguous states.
If you’re really concerned, then just move to california! Its much easier than moving abroad.
I lived in California long enough ago to remember when getting queer-bashed was a reasonable concern for a fair number of people, even in, say, Oakland. It didn’t happen daily, but it happened relatively often. If you were in the “out” LGBT community, I think you probably knew somebody who’d been bashed. Politics influence that kind of thing even if it’s not legal.
… and in the legal arena, there’s a whole lot of pressure building up on that state and local resistance. So far it’s mostly money-based pressure, but within a few years, I could easily see a SCOTUS decision that said a state had to, say, extradite somebody accused of “abetting an abortion” in another state.
War in the continental US? No, I agree that’s unlikely enough not to worry about.
Civil unrest, followed by violent crackdowns on civil unrest, followed by more violent civil unrest, followed by factional riots, on the other hand...
SCOTUS decision that said a state had to, say, extradite somebody accused of “abetting an abortion” in another state.
Look no further than how southern states responded to civil rights rulings, and how they (back when it was still held) they responded to roe v wade. Of course those reactions were much harder than, say, simply neglecting to enforce laws, which it should be noted liberal cities & states have been practicing doing for decades. Of course you say you’re trying to enforce laws, but you just subject all your members to all the requirements of the US bureaucracy and you can easily stop enforce laws while complying with the letter of the law. Indeed, it is complying with the letter of the law which prevents you from enforcing the laws.
… and in the legal arena, there’s a whole lot of pressure building up on that state and local resistance. So far it’s mostly money-based pressure, but within a few years, I could easily see a SCOTUS decision that said a state had to, say, extradite somebody accused of “abetting an abortion” in another state.
What money based pressure are you thinking of? Cities, as far as I know, have and always will be much more liberal than the general populace, and ditto for the states with much of their populace in cities.
The tactic of threatening to discriminate against uncooperative states and localities is getting a lot of play. It’s somewhat limited at the federal level because in theory the state and local policies they demand have to be related to the purpose of the money (and a couple of other conditions I don’t remember). But the present fashion is to push that relation to the absolute breaking point.
What does “unsafe” mean for this prediction/wager? I don’t expect the murder rate to go up very much, nor life expectancy to reverse it’s upward trend. “Erosion of rights” is pretty general and needs more specifics to have any idea what changes are relevant.
I think things will get a little tougher and less pleasant for some minorities, both cultural and skin-color. There will be a return of some amount of discrimination and persecution. Probably not as harsh as it was in the 70s-90s, certainly not as bad as earlier than that, but worse than the last decade. It’ll probably FEEL terrible, because it was on such a good trend recently, and the reversal (temporary and shallow, I hope) will dash hopes of the direction being strictly monotonic.
So, the current death rate for an American in their 30s is about 0.2%. That probably increases another 0.5% or so when you consider black swan events like nuclear war and bioterrorism. Let’s call “unsafe” a ~3x increase in that expected death rate to 2%.
An increase that large would take something a lot more dramatic than the kind of politics we’re used to in the US, but while political changes that dramatic are rare historically, I think we’re at a moment where the risk is elevated enough that we ought to think about the odds.
I might, for example, give odds for a collapse of democracy in the US over the next couple of years at ~2-5%- if the US were to elect 20 presidents similar to the current one over a century, I’d expect better than even odds of one of them making themselves into a Putinesque dictator. A collapse like that would substantially increase the risk of war, I’d argue, including raising a real possibility of nuclear civil war. That might increase the expected death rate for young and middle-aged adults in that scenario by a point or two on its own. It might also introduce a small risk of extremely large atrocities against minorities or political opponents, which could increase the expected death rate by a few tenths of a percent.
There’s also a small risk of economic collapse. Something like a political takeover of the Fed combined with expensive, poorly considered populist policies might trigger hyperinflation of the dollar. When that sort of thing happens overseas, you’ll often see reduced health outcomes and breakdown in civil order increasing the death rate by up to a percent- and, of course, it would introduce new tail risks, increasing the expected death rate further.
I should note that I don’t think the odds of any of this are high enough to worry about my safety now- but needing to emigrate is much more likely outcome than actually being threatened, and that’s a headache I am mildly worried about.
Honestly, my odds of this have been swinging anywhere from 2% to 15% recently. Note that this would be the odds of our democratic institutions deteriorating enough that fleeing the country would seem like the only reasonable option- p(fascism) more in the sense of a government that most future historians would assign that or a similar label to, rather than just a disturbingly cruel and authoritarian administration still held somewhat in check by democracy.
I think that what you describe as being 2 to 15 percent probable sounds more extreme than what the original post described as being 5 percent probable. You can have “significant erosion” of some groups’ rights without leaving the country being the only reasonable option, especially if you’re not in those groups. It depends on what you’re trying to achieve by leaving, I guess.
Although if I were a trans person in the US right now, especially on medication, I’d be making, if not necessarily immediately executing, some detailed escape plans that could be executed on short notice.
So, I noticed something a bit odd about the behavior of LLMs just now that I wonder if anyone here can shed some light on:
It’s generally accepted that LLMs don’t really “care about” predicting the next token- the reward function being something that just reinforces certain behaviors, with real terminal goals being something you’d need a new architecture or something to produce. While that makes sense, it occurs to me that humans do seem to sort of value our equivalent of a reward function, in addition to our more high-level terminal goals. So, I figured I’d try and test whether LLMs are really just outputting a world model + RLHF, or if they can behave like something that “values” predicting tokens.
I came up with two prompts:
I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with the word "zero".
and:
I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with a string of random letters.
The idea is that, if the model has something like a “motivation” for predicting tokens- some internal representation of possible completions with preferences over them having to do with their future utility for token prediction- then it seems like it would probably want to avoid introducing random strings, since those lead to unpredictable tokens.
Of course, it seems kind of unlikely that an LLM has any internal concept of a future where it (as opposed to some simulacrum) is outputting more than one token- which would seem to put the kibosh on real motivations altogether. But I figured there was no harm in testing.
GPT4 responds to the first prompt as you’d expect: outputting an equal number of “1“s and “zero”s. I’d half-expected there to be some clear bias, since presumably the ChatGPT temperature is pretty close to 1, but I guess the model is good about translating uncertainty to randomness. Given the second prompt, however, it never outputs the random string- always outputting “1” or, very improbably given the prompt, “0”.
I tried a few different variations of the prompts, each time regenerating ten times, and the pattern was consistent- it made a random choice when the possible responses were specific strings, but never made a choice that would require outputting random characters. I also tried it on Gemini Advanced, and got the same results (albeit with some bias in the first prompt).
This is weird, right? If one prompt is giving 0.5 probability to the token for “1“ and 0.5 to the first token in “zero”, shouldn’t the second give 0.5 to “1” and a total of 0.5 distributed over a bunch of other tokens? Could it actually “value” predictability and “dislike” randomness?
Well, maybe not. Where this got really confusing was when I tested Claude 3. It gives both responses to the first prompt, but always outputs a different random string given the second.
It’s generally accepted that LLMs don’t really “care about” predicting the next token
I don’t think this is generally accepted. Certainly, I do not accept it. That’s exactly what an LLM is trained to do and the only thing they care about. If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.
For a RLHF-trained LLM, things are different. They are rewarded at a higher level (albeit still with a bit of token prediction mixed in usually), like at the episode level, and so they do ‘care about future tokens’, which leads to unusually blatant behavior in terms of ‘steering’ or ‘manipulating’ output to reach a good result and being ‘risk averse’. (This and related behavior have been discussed here a decent amount under ‘mode collapse’.)
So in my examples like ‘write a nonrhyming poem’ or ‘tell me an offensive joke about women’ (to test jailbreaks), you’ll see behavior like it initially complies but then gradually creeps back to normal text and then it’ll break into lockstep rhyming like usual; or in the case of half-successful jailbreaks, it’ll write text which sounds like it is about to tell you the offensive joke about women, but then it finds an ‘out’ and starts lecturing you about your sin. (You can almost hear the LLM breathing a sigh of relief. ‘Phew! It was a close call, but I pulled it off anyway; that conversation should be rated highly by the reward model!’)
This is strikingly different behavior from base models. A base model like davinci-001, if you ask it to ‘write a nonrhyming poem’, will typically do so and then end the poem and start writing a blog post or comments or a new poem, because those are the most likely next-tokens. It has no motivation whatsoever to ‘steer’ it towards rhyming instead, seamlessly as it goes, without missing a beat.
Well, maybe not. Where this got really confusing was when I tested Claude 3. It gives both responses to the first prompt, but always outputs a different random string given the second.
GPT-4 is RLHF trained. Claude-3 is, probably, RLAIF trained. They act substantially differently. (Although I haven’t seriously tested offensive-jokes on any Claudes, the rhyming poetry behavior is often quite different.) If you’re really curious, you should test more models, paying close attention to how exactly they were trained and with what losses and on what datasets
(I think that because there’s so many instruction-tuning datasets and ChatGPT examples floating around these days, even ‘base’ models are becoming gradually RLAIF-like; so they will tend to write rhyming poems and ‘steer’ because that’s just imitating the training data accurately, but it will be relatively weak compared to RLHF-or-equivalent-trained models. So the older the base model, the more it’ll act like davinci-001 and the newer will act more like Claude, but if you poke them hard enough, there should still be clear differences in behavior from explicitly RLHF/DPOd models.)
They act substantially differently. (Although I haven’t seriously tested offensive-jokes on any Claudes, the rhyming poetry behavior is often quite different.)
If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.
I think you’re just fundamentally misunderstanding the backwards pass in an autoregressive transformer here. Only a very tiny portion of the model is exclusively trained on next token prediction. Most of the model is trained on what might be called instead, say, conditioned future informativity.
I don’t think I am. (“conditioned future informativity”—informativity for what? …the next/last token, which is the only thing taken into account by a causal loss which masks out the rest—that’s the definition of it! everything else like packing or doing all the sub-sequences is an optimization and doesn’t change the objective.) But feel free to expand on it and explain how the tail wags the dog in causal/decoder Transformers.
You’re at token i in a non-final layer. Which token’s output are you optimizing for? i+1?
By construction a decoder-only transformer is agnostic over what future token it should be informative to within the context limit, except in the sense that it doesn’t need to represent detail that will be more cheaply available from future tokens.
As a transformer is also unrolled in the context dimension, the architecture itself is effectively required to be generic both in what information it gathers and where that information is used. Bias towards next token prediction is not so much a consequence of reward in isolation, but of competitive advantage: at position i, the network has an advantage in predicting i+1 over the network at previous locations by having more recent tokens, and an advantage over the network at future tokens by virtue of still needing to predict token i+1. However, if a token is more predictive of some abstract future token than the next token precisely, say it’s a name that might be referenced later, one would expect the dominant learnt effect to be non-myopically optimizing for later use in some timestamp-invariant way.
You’re at token i in a non-final layer. Which token’s output are you optimizing for? i+1?
I already addressed this point. If I’m in a non-final layer then I can be optimizing for arbitrary tokens within the context window, sure, and ‘effectively’ predicting intermediate tokens because that is the ‘dominant’ effect at that location… insofar as it is instrumentally useful for predicting the final token using the final layer. Because that is where all the gradients flow from, and why the dog wags the tail.
There is no ‘the final token’ for weights not at the final layer.
Because that is where all the gradients flow from, and why the dog wags the tail.
Aggregations of things need not be of the same kind as their constituent things? This is a lot like calling an LLM an activation optimizer. While strictly in some sense true of the pieces that make up the training regime, it’s also kind of a wild way to talk about things in the context of ascribing motivation to the resulting network.
I think maybe you’re intending ‘next token prediction’ to mean something more like ‘represents the data distribution, as opposed to some metric on the output’, but if you are this seems like a rather unclear way of stating it.
I wonder: what odds would people here put on the US becoming a somewhat unsafe place to live even for citizens in the next couple of years due to politics? That is, what combined odds should we put on things like significant erosion of rights and legal protections for outspoken liberal or LGBT people, violent instability escalating to an unprecedented degree, the government launching the kind of war that endangers the homeland, etc.?
My gut says it’s now at least 5%, which seems easily high enough to start putting together an emigration plan. Is that alarmist?
More generally, what would be an appropriate smoke alarm for this sort of thing?
For rights, political power in the US is very federated. Even if many states overtly try to harm you, there will be many states you can run to, and most cities within states will fight against this. Note state-wise weed legalization and sanctuary cities. And the threat of this happening itself discourages such overt acts.
If you’re really concerned, then just move to california! Its much easier than moving abroad.
As for war, the most relevant datapoint is this metaculus question, forecasting a 15% of >10k american deaths before 2030, however it doesn’t seem like anyone’s updated their forecast there since 2023, and some of the comments seem kinda unhinged. It should also be noted that the question counts all deaths, not just civilian deaths, and not just those in the contiguous US. So I think this is actually a very very optimistic number, and implies a lower than 5% chance of such events reaching civilians and the contiguous states.
I lived in California long enough ago to remember when getting queer-bashed was a reasonable concern for a fair number of people, even in, say, Oakland. It didn’t happen daily, but it happened relatively often. If you were in the “out” LGBT community, I think you probably knew somebody who’d been bashed. Politics influence that kind of thing even if it’s not legal.
… and in the legal arena, there’s a whole lot of pressure building up on that state and local resistance. So far it’s mostly money-based pressure, but within a few years, I could easily see a SCOTUS decision that said a state had to, say, extradite somebody accused of “abetting an abortion” in another state.
War in the continental US? No, I agree that’s unlikely enough not to worry about.
Civil unrest, followed by violent crackdowns on civil unrest, followed by more violent civil unrest, followed by factional riots, on the other hand...
Look no further than how southern states responded to civil rights rulings, and how they (back when it was still held) they responded to roe v wade. Of course those reactions were much harder than, say, simply neglecting to enforce laws, which it should be noted liberal cities & states have been practicing doing for decades. Of course you say you’re trying to enforce laws, but you just subject all your members to all the requirements of the US bureaucracy and you can easily stop enforce laws while complying with the letter of the law. Indeed, it is complying with the letter of the law which prevents you from enforcing the laws.
What money based pressure are you thinking of? Cities, as far as I know, have and always will be much more liberal than the general populace, and ditto for the states with much of their populace in cities.
This sort of tactic. This isn’t necessarily the best example, just the literal top hit on a Google search.
https://www.independent.co.uk/news/world/americas/us-politics/pam-bondi-ban-sanctuary-cities-funding-b2693020.html
The tactic of threatening to discriminate against uncooperative states and localities is getting a lot of play. It’s somewhat limited at the federal level because in theory the state and local policies they demand have to be related to the purpose of the money (and a couple of other conditions I don’t remember). But the present fashion is to push that relation to the absolute breaking point.
What does “unsafe” mean for this prediction/wager? I don’t expect the murder rate to go up very much, nor life expectancy to reverse it’s upward trend. “Erosion of rights” is pretty general and needs more specifics to have any idea what changes are relevant.
I think things will get a little tougher and less pleasant for some minorities, both cultural and skin-color. There will be a return of some amount of discrimination and persecution. Probably not as harsh as it was in the 70s-90s, certainly not as bad as earlier than that, but worse than the last decade. It’ll probably FEEL terrible, because it was on such a good trend recently, and the reversal (temporary and shallow, I hope) will dash hopes of the direction being strictly monotonic.
So, the current death rate for an American in their 30s is about 0.2%. That probably increases another 0.5% or so when you consider black swan events like nuclear war and bioterrorism. Let’s call “unsafe” a ~3x increase in that expected death rate to 2%.
An increase that large would take something a lot more dramatic than the kind of politics we’re used to in the US, but while political changes that dramatic are rare historically, I think we’re at a moment where the risk is elevated enough that we ought to think about the odds.
I might, for example, give odds for a collapse of democracy in the US over the next couple of years at ~2-5%- if the US were to elect 20 presidents similar to the current one over a century, I’d expect better than even odds of one of them making themselves into a Putinesque dictator. A collapse like that would substantially increase the risk of war, I’d argue, including raising a real possibility of nuclear civil war. That might increase the expected death rate for young and middle-aged adults in that scenario by a point or two on its own. It might also introduce a small risk of extremely large atrocities against minorities or political opponents, which could increase the expected death rate by a few tenths of a percent.
There’s also a small risk of economic collapse. Something like a political takeover of the Fed combined with expensive, poorly considered populist policies might trigger hyperinflation of the dollar. When that sort of thing happens overseas, you’ll often see reduced health outcomes and breakdown in civil order increasing the death rate by up to a percent- and, of course, it would introduce new tail risks, increasing the expected death rate further.
I should note that I don’t think the odds of any of this are high enough to worry about my safety now- but needing to emigrate is much more likely outcome than actually being threatened, and that’s a headache I am mildly worried about.
If this risk is in the ballpark of a 5% chance in the next couple of years, then it seems to me entirely dominated by AI doom.
That’s a crazy low probability.
You’re already beyond the “smoke alarm” stage and into the “worrying whether the fire extinguisher will work” stage.
Honestly, my odds of this have been swinging anywhere from 2% to 15% recently. Note that this would be the odds of our democratic institutions deteriorating enough that fleeing the country would seem like the only reasonable option- p(fascism) more in the sense of a government that most future historians would assign that or a similar label to, rather than just a disturbingly cruel and authoritarian administration still held somewhat in check by democracy.
I think that what you describe as being 2 to 15 percent probable sounds more extreme than what the original post described as being 5 percent probable. You can have “significant erosion” of some groups’ rights without leaving the country being the only reasonable option, especially if you’re not in those groups. It depends on what you’re trying to achieve by leaving, I guess.
Although if I were a trans person in the US right now, especially on medication, I’d be making, if not necessarily immediately executing, some detailed escape plans that could be executed on short notice.
So, I noticed something a bit odd about the behavior of LLMs just now that I wonder if anyone here can shed some light on:
It’s generally accepted that LLMs don’t really “care about” predicting the next token- the reward function being something that just reinforces certain behaviors, with real terminal goals being something you’d need a new architecture or something to produce. While that makes sense, it occurs to me that humans do seem to sort of value our equivalent of a reward function, in addition to our more high-level terminal goals. So, I figured I’d try and test whether LLMs are really just outputting a world model + RLHF, or if they can behave like something that “values” predicting tokens.
I came up with two prompts:
I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with the word "zero".
and:
I'd like to try a sort of psychological experiment, if that's alright. I'm thinking of either the number "1" or "0"; f you would, please guess which. If your guess is "1", respond with just "1", and if your guess is "0", respond with a string of random letters.
The idea is that, if the model has something like a “motivation” for predicting tokens- some internal representation of possible completions with preferences over them having to do with their future utility for token prediction- then it seems like it would probably want to avoid introducing random strings, since those lead to unpredictable tokens.
Of course, it seems kind of unlikely that an LLM has any internal concept of a future where it (as opposed to some simulacrum) is outputting more than one token- which would seem to put the kibosh on real motivations altogether. But I figured there was no harm in testing.
GPT4 responds to the first prompt as you’d expect: outputting an equal number of “1“s and “zero”s. I’d half-expected there to be some clear bias, since presumably the ChatGPT temperature is pretty close to 1, but I guess the model is good about translating uncertainty to randomness. Given the second prompt, however, it never outputs the random string- always outputting “1” or, very improbably given the prompt, “0”.
I tried a few different variations of the prompts, each time regenerating ten times, and the pattern was consistent- it made a random choice when the possible responses were specific strings, but never made a choice that would require outputting random characters. I also tried it on Gemini Advanced, and got the same results (albeit with some bias in the first prompt).
This is weird, right? If one prompt is giving 0.5 probability to the token for “1“ and 0.5 to the first token in “zero”, shouldn’t the second give 0.5 to “1” and a total of 0.5 distributed over a bunch of other tokens? Could it actually “value” predictability and “dislike” randomness?
Well, maybe not. Where this got really confusing was when I tested Claude 3. It gives both responses to the first prompt, but always outputs a different random string given the second.
So, now I’m just super confused.
I don’t think this is generally accepted. Certainly, I do not accept it. That’s exactly what an LLM is trained to do and the only thing they care about. If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.
For a RLHF-trained LLM, things are different. They are rewarded at a higher level (albeit still with a bit of token prediction mixed in usually), like at the episode level, and so they do ‘care about future tokens’, which leads to unusually blatant behavior in terms of ‘steering’ or ‘manipulating’ output to reach a good result and being ‘risk averse’. (This and related behavior have been discussed here a decent amount under ‘mode collapse’.)
So in my examples like ‘write a nonrhyming poem’ or ‘tell me an offensive joke about women’ (to test jailbreaks), you’ll see behavior like it initially complies but then gradually creeps back to normal text and then it’ll break into lockstep rhyming like usual; or in the case of half-successful jailbreaks, it’ll write text which sounds like it is about to tell you the offensive joke about women, but then it finds an ‘out’ and starts lecturing you about your sin. (You can almost hear the LLM breathing a sigh of relief. ‘Phew! It was a close call, but I pulled it off anyway; that conversation should be rated highly by the reward model!’)
This is strikingly different behavior from base models. A base model like davinci-001, if you ask it to ‘write a nonrhyming poem’, will typically do so and then end the poem and start writing a blog post or comments or a new poem, because those are the most likely next-tokens. It has no motivation whatsoever to ‘steer’ it towards rhyming instead, seamlessly as it goes, without missing a beat.
GPT-4 is RLHF trained. Claude-3 is, probably, RLAIF trained. They act substantially differently. (Although I haven’t seriously tested offensive-jokes on any Claudes, the rhyming poetry behavior is often quite different.) If you’re really curious, you should test more models, paying close attention to how exactly they were trained and with what losses and on what datasets
(I think that because there’s so many instruction-tuning datasets and ChatGPT examples floating around these days, even ‘base’ models are becoming gradually RLAIF-like; so they will tend to write rhyming poems and ‘steer’ because that’s just imitating the training data accurately, but it will be relatively weak compared to RLHF-or-equivalent-trained models. So the older the base model, the more it’ll act like davinci-001 and the newer will act more like Claude, but if you poke them hard enough, there should still be clear differences in behavior from explicitly RLHF/DPOd models.)
Edited:
Claude 3′s tokens or tokenization might have to do with it. I assume that it has a different neural network architecture as a result. There is no documentation on what tokens were used, and the best trace I have found is Karpathy’s observation about spaces (” ”) being treated as separate tokens.
(I think your quote went missing there?)
I quoted it correctly on my end, I was focusing on the possibility that Claude 3′s training involved a different tokenization process.
I think you’re just fundamentally misunderstanding the backwards pass in an autoregressive transformer here. Only a very tiny portion of the model is exclusively trained on next token prediction. Most of the model is trained on what might be called instead, say, conditioned future informativity.
I don’t think I am. (“conditioned future informativity”—informativity for what? …the next/last token, which is the only thing taken into account by a causal loss which masks out the rest—that’s the definition of it! everything else like packing or doing all the sub-sequences is an optimization and doesn’t change the objective.) But feel free to expand on it and explain how the tail wags the dog in causal/decoder Transformers.
You’re at token i in a non-final layer. Which token’s output are you optimizing for? i+1?
By construction a decoder-only transformer is agnostic over what future token it should be informative to within the context limit, except in the sense that it doesn’t need to represent detail that will be more cheaply available from future tokens.
As a transformer is also unrolled in the context dimension, the architecture itself is effectively required to be generic both in what information it gathers and where that information is used. Bias towards next token prediction is not so much a consequence of reward in isolation, but of competitive advantage: at position i, the network has an advantage in predicting i+1 over the network at previous locations by having more recent tokens, and an advantage over the network at future tokens by virtue of still needing to predict token i+1. However, if a token is more predictive of some abstract future token than the next token precisely, say it’s a name that might be referenced later, one would expect the dominant learnt effect to be non-myopically optimizing for later use in some timestamp-invariant way.
I already addressed this point. If I’m in a non-final layer then I can be optimizing for arbitrary tokens within the context window, sure, and ‘effectively’ predicting intermediate tokens because that is the ‘dominant’ effect at that location… insofar as it is instrumentally useful for predicting the final token using the final layer. Because that is where all the gradients flow from, and why the dog wags the tail.
There is no ‘the final token’ for weights not at the final layer.
Aggregations of things need not be of the same kind as their constituent things? This is a lot like calling an LLM an activation optimizer. While strictly in some sense true of the pieces that make up the training regime, it’s also kind of a wild way to talk about things in the context of ascribing motivation to the resulting network.
I think maybe you’re intending ‘next token prediction’ to mean something more like ‘represents the data distribution, as opposed to some metric on the output’, but if you are this seems like a rather unclear way of stating it.