Basically, I think that we should expect a lot of SGD results to result in weights that do serial processing on inputs, refining and reshaping the content into twisted and rotated and stretched high dimensional spaces SUCH THAT those spaces enable simply cutoff based reasoning to “kinda really just work”.
I mostly agree with that. I expect that the SGD approach will tend to find transformations that tend to stretch and distort the possibility space such that non-adversarially-selected instances of one class are almost perfectly linearly separable from non-adversarially-selected instances of another class.
My intuition is that stacking a bunch of linear-transform-plus-nonlinearity layers on top of each other lets you hammer something that looks like the chart on the left into something that looks like the chart on the right (apologies for the potato quality illustration)
As such, I think the linear separability comes from the power of the “lol stack more layers” approach, not from some intrinsic simple structure of the underlying data. As such, I don’t expect very much success for approaches that look like “let’s try to come up with a small set of if/else statements that cleave the categories at the joints instead of inelegantly piling learned heuristics on top of each other”.
So if the “governance”, “growth rate”, “cost”, and “sales” dimensions go into certain regions of the parameter space, each one could strongly contribute to a “don’t invest” signal, but if they are all in the green zone then you invest… and that’s that?
I think that such a model would do quite a bit better than chance. I don’t think that such a model would succeed because it “cleaves reality at the joints” though, I expect it would succeed because you’ve managed to find a way that “better than chance” is good enough and you don’t need to make arbitrarily good predictions. Perfectly fine if you’re a venture capitalist, not so great if you’re seeking adversarial robustness.
I appreciate this response because it stirred up a lot of possible responses, in me, in lots of different directions, that all somehow seems germane to the core goal of securing a Win Conditions for the sapient metacivilization of earth! <3
(A) Physical reality is probably hyper-computational, but also probably amenable to pulling a nearly infinite stack of “big salient features” from a reductively analyzable real world situation.
My intuition says that this STOPS being “relevant to human interests” (except for modern material engineering and material prosperity and so on) roughly below the level of “the cell”.
Other physics with other biochemistry could exist, and I don’t think any human would “really care”?
Suppose a Benevolent SAI had already replaced all of our cells with nanobots without our permission AND without us noticing because it wanted to have “backups” or something like that…
(The AI in TMOPI does this much less elegantly, because everything in that story is full of hacks and stupidity. The overall fact that “everything is full of hacks and stupidity” is basically one of the themes of that novel.)
Contingent on a Benevoent SAI having thought it had good reason to do such a thing, I don’t think that once we fully understand the argument in favor of doing it that we would really have much basis for objecting?
But I don’t know for sure, one way or the other...
((To be clear, in this hypothetical, I think I’d volunteer to accept the extra risk to be one of the last who was “Saved” this way, and I’d volunteer to keep the secret, and help in a QA loop of grounded human perceptual feedback, to see if some subtle spark of magical-somethingness had been lost in everyone transformed this way? Like… like hypothetically “quantum consciousness” might be a real thing, and maybe people switched over to running atop “greygoo” instead of our default “pinkgoo” changes how “quantum consciousness” works and so the changeover would non-obviously involve a huge cognitive holocaust of sorts? But maybe not! Experiments might be called for… and they might need informed consent? …and I think I’d probably consent to be in “the control group that is unblinded as part of the later stages of the testing process” but I would have a LOT of questions before I gave consent to something Big And Smart that respected “my puny human capacity to even be informed, and ‘consent’ in some limited and animal-like way”.))
What I’m saying is: I think maybe NORMAL human values (amongst people with default mental patterns rather than weirdo autists who try to actually be philosophically coherent and ended up with utility functions that have coherently and intentionally unbounded upsides) might well be finite, and a rule for granting normal humans a perceptually indistinguishable version of “heaven” might be quite OK to approximate with “a mere a few billion well chosen if/then statements”.
To be clear, the above is a response to this bit:
As such, I think the linear separability comes from the power of the “lol stack more layers” approach, not from some intrinsic simple structure of the underlying data. As such, I don’t expect very much success for approaches that look like “let’s try to come up with a small set of if/else statements that cleave the categories at the joints instead of inelegantly piling learned heuristics on top of each other”.
And:
I don’t think that such a model would succeed because it “cleaves reality at the joints” though, I expect it would succeed because you’ve managed to find a way that “better than chance” is good enough and you don’t need to make arbitrarily good predictions.
Basically, I think “good enough” might be “good enough” for persons with finite utility functions?
(B) A completely OTHER response here is that you should probably take care to NOT aim for something that is literally mathematically impossible...
Unless this is part of some clever long term cognitive strategy, where you try to prove one crazy extreme, and then its negation, back and forth, as a sort of “personally implemented GAN research process” (and even then?!)...
...you should probably not spend much time trying to “prove that 1+1=5” nor try to “prove that the Halting Problem actually has a solution”. Personally, any time I reduce a given plan to “oh, this is just the Halting Problem again” I tend to abandon that line of work.
Perfectly fine if you’re a venture capitalist, not so great if you’re seeking adversarial robustness.
Past a certain point, one can simply never be adversarially robust in a programmatic and symbolically expressible way.
Humans would have to have non-Turing-Complete souls, and so would any hypothetical Corrigible Robot Saint/Slaves, in order to literally 100% prove that literally infinite computational power won’t find a way to make things horrible.
There is no such thing as a finitely expressible “Halt if Evil” algorithm...
...unless (I think?) all “agents” involved are definitely not Turing Complete and have no emotional attachments to any questions whose answers partake of the challenges of working with Turing Complete systems? And maybe someone other than me is somehow smart enough to write a model of “all the physics we care about” and “human souls” and “the AI” all in some dependently typed language that will only compile if the compiler can generate and verify a “proof that each program, and ALL programs interacting with each other, halt on all possible inputs”?
My hunch is that that effort will fail, over and over, forever, but I don’t have a good clean proof that it will fail.
Note that I’m pretty sure A and B are incompatible takes.
In “take A” I’m working from human subjectivity “down towards physics (through a vast stack of sociology and biology and so on)” and it just kinda seems like physics is safe to throw away because human souls and our humanistically normal concerns are probably mostly pretty “computational paltry” and merely about securing food, and safety, and having OK romantic lives?
In “take B” I’m starting with the material that mathematicians care about, and noticing that it means the project is doomed if the requirement is to have a mathematical proof about all mathematically expressible cares or concerns.
It would be… kinda funny, maybe, to end up believing “we can secure a Win Condition for the Normies (because take A is basically true), but True Mathematicians are doomed-and-blessed-at-the-same-time to eternal recursive yearning and Real Risk (because take B is also basically true)” <3
(C) Chaos is a thing! Even (and especially) in big equations, including the equations of mind that big stacks of adversarially optimized matrices represent!
This isn’t a “logically deep” point. I’m just vibing with your picture where you imagine that the “turbulent looking” thing is a metaphor for reality.
Abstract: Some fractals—for instance those associated with the Mandelbrot and quadratic Julia sets—are computed by iterating a function, and identifying the boundary between hyperparameters for which the resulting series diverges or remains bounded. Neural network training similarly involves iterating an update function (e.g. repeated steps of gradient descent), can result in convergent or divergent behavior, and can be extremely sensitive to small changes in hyperparameters. Motivated by these similarities, we experimentally examine the boundary between neural network hyperparameters that lead to stable and divergent training. We find that this boundary is fractal over more than ten decades of scale in all tested configurations.
EDITED TO ADD: You respond “Lots of food for thought here, I’ve got some responses brewing but it might be a little bit” and I am happy to wait. Quality over speed is probably maybe still sorta correct. Timelines are compressing, but not so much that minutes matter… yet?
Alright, it’s been more than a “little” bit (new baby, haven’t had a ton of time), and not as complete of a reply as I was hoping to write, but
(A) Physical reality is probably hyper-computational
My impression is almost the opposite—physical reality seems not only to contain a finite amount of information and have a finite capacity for processing that information, but on top of that the “finite” in question seems surprisingly small. Specifically, the entropy of the observable universe seems to be in the ballpark of 10124 bits (c=3×108ms, rhorizon=4.4×1026m, so area is given by A=4πr2=2.4×1054m2. The Planck length is lp=1.6×10−35m and thus the Bekenstein–Hawking entropy in natural units is justSBH=A4=9.5×10123nats=1.3×10124bits). For context, the best estimate I’ve seen is that the total amount of data stored by humanity is in the ballpark of 1023 bits. If data storage increases exponentially, we’re a fifth of the way to “no more data storage capacity in the universe”. And similarly Landauer gives pretty tight bounds on computational capacity (I think something on the order of 10229 bit erasures as an upper bound if my math checks out).
So the numbers are large, but not “you couldn’t fit the number on a page if you tried to write it out” large.
but also probably amenable to pulling a nearly infinite stack of “big salient features” from a reductively analyzable real world situation
If we’re limiting it to “big salient features” I expect the number of features one actually cares about in a normal situation is actually pretty small. Deciding which features are relevant, though, can be nontrivial.
What I’m saying is: I think maybe NORMAL human values (amongst people with default mental patterns rather than weirdo autists who try to actually be philosophically coherent and ended up with utility functions that have coherently and intentionally unbounded upsides) might well be finite, and a rule for granting normal humans a perceptually indistinguishable version of “heaven” might be quite OK to approximate with “a mere a few billion well chosen if/then statements”.
I don’t disagree that a few billion if/then statements are likely sufficient. There’s a sense in which e.g. the MLP layers of an LLM are just doing a few tens of millions “if residual > value in this direction then write value in that direction, else noop” operations, which might be using clever-high-dimensional-space-tricks to emulate doing a few billion such operations, and they are able to answer questions about human values in a sane way, so it’s not like those values are intractable to express.
The rules for a policy that achieves arbitrarily good outcomes according to that relatively-compactly-specifiable value system, however, might not admit “a few billion well-chosen if/then statements”. For example it’s pretty easy to specify “earning block reward for the next bitcoin block would be good” but you can’t have a few billion if/then statements which will tell you what nonce to choose such that the hash of the resulting block has enough zeros to earn the block reward.
If I had to condense it down to one sentence it would be “the hard part of consequentialism is figuring out tractable rules which result in good outcomes when applied, not in figuring out which outcomes are even good”.
(B) A completely OTHER response here is that you should probably take care to NOT aim for something that is literally mathematically impossible...
Agreed. I would go further than “not literally mathematically impossible” and further specify “and also not trying to find a fully optimal solution for an exponentially hard problem with large n”.
Past a certain point, one can simply never be adversarially robust in a programmatic and symbolically expressible way.
Beautifully put. I suspect that “certain point” is one of those “any points we actually care about are way past this point” things.
(C) Chaos is a thing! Even (and especially) in big equations, including the equations of mind that big stacks of adversarially optimized matrices represent!
“The boundary of neural network trainability is fractal” was a significant part of my inspiration for writing the above. Honestly, if I ever get some nice contiguous more-than-1-hour chunks of time I’d like to write a post along the lines of “sometimes there are no neat joints separating clusters in thing-space” which emphasizes that point (my favorite concrete example is that “predict which complex root of a cubic or higher polynomial Euler’s method will converge to given a starting value” generates a fractal rather than the voronoi-diagram-looking-thing you would naively expect).
It would be… kinda funny, maybe, to end up believing “we can secure a Win Condition for the Normies (because take A is basically true), but True Mathematicians are doomed-and-blessed-at-the-same-time to eternal recursive yearning and Real Risk (because take B is also basically true)” <3
I think this is likely true. Though I think there are probably not that many True Mathematicians once the people who have sought and found comfort in math find that they can also seek and find comfort outside of math.
Quality over speed is probably maybe still sorta correct
Don’t worry, the speed was low but the quality was on par with it :)
I mostly agree with that. I expect that the SGD approach will tend to find transformations that tend to stretch and distort the possibility space such that non-adversarially-selected instances of one class are almost perfectly linearly separable from non-adversarially-selected instances of another class.
My intuition is that stacking a bunch of linear-transform-plus-nonlinearity layers on top of each other lets you hammer something that looks like the chart on the left into something that looks like the chart on the right (apologies for the potato quality illustration)
As such, I think the linear separability comes from the power of the “lol stack more layers” approach, not from some intrinsic simple structure of the underlying data. As such, I don’t expect very much success for approaches that look like “let’s try to come up with a small set of if/else statements that cleave the categories at the joints instead of inelegantly piling learned heuristics on top of each other”.
I think that such a model would do quite a bit better than chance. I don’t think that such a model would succeed because it “cleaves reality at the joints” though, I expect it would succeed because you’ve managed to find a way that “better than chance” is good enough and you don’t need to make arbitrarily good predictions. Perfectly fine if you’re a venture capitalist, not so great if you’re seeking adversarial robustness.
I appreciate this response because it stirred up a lot of possible responses, in me, in lots of different directions, that all somehow seems germane to the core goal of securing a Win Conditions for the sapient metacivilization of earth! <3
(A) Physical reality is probably hyper-computational, but also probably amenable to pulling a nearly infinite stack of “big salient features” from a reductively analyzable real world situation.
My intuition says that this STOPS being “relevant to human interests” (except for modern material engineering and material prosperity and so on) roughly below the level of “the cell”.
Other physics with other biochemistry could exist, and I don’t think any human would “really care”?
Suppose a Benevolent SAI had already replaced all of our cells with nanobots without our permission AND without us noticing because it wanted to have “backups” or something like that…
(The AI in TMOPI does this much less elegantly, because everything in that story is full of hacks and stupidity. The overall fact that “everything is full of hacks and stupidity” is basically one of the themes of that novel.)
Contingent on a Benevoent SAI having thought it had good reason to do such a thing, I don’t think that once we fully understand the argument in favor of doing it that we would really have much basis for objecting?
But I don’t know for sure, one way or the other...
((To be clear, in this hypothetical, I think I’d volunteer to accept the extra risk to be one of the last who was “Saved” this way, and I’d volunteer to keep the secret, and help in a QA loop of grounded human perceptual feedback, to see if some subtle spark of magical-somethingness had been lost in everyone transformed this way? Like… like hypothetically “quantum consciousness” might be a real thing, and maybe people switched over to running atop “greygoo” instead of our default “pinkgoo” changes how “quantum consciousness” works and so the changeover would non-obviously involve a huge cognitive holocaust of sorts? But maybe not! Experiments might be called for… and they might need informed consent? …and I think I’d probably consent to be in “the control group that is unblinded as part of the later stages of the testing process” but I would have a LOT of questions before I gave consent to something Big And Smart that respected “my puny human capacity to even be informed, and ‘consent’ in some limited and animal-like way”.))
What I’m saying is: I think maybe NORMAL human values (amongst people with default mental patterns rather than weirdo autists who try to actually be philosophically coherent and ended up with utility functions that have coherently and intentionally unbounded upsides) might well be finite, and a rule for granting normal humans a perceptually indistinguishable version of “heaven” might be quite OK to approximate with “a mere a few billion well chosen if/then statements”.
To be clear, the above is a response to this bit:
And:
Basically, I think “good enough” might be “good enough” for persons with finite utility functions?
(B) A completely OTHER response here is that you should probably take care to NOT aim for something that is literally mathematically impossible...
Unless this is part of some clever long term cognitive strategy, where you try to prove one crazy extreme, and then its negation, back and forth, as a sort of “personally implemented GAN research process” (and even then?!)...
...you should probably not spend much time trying to “prove that 1+1=5” nor try to “prove that the Halting Problem actually has a solution”. Personally, any time I reduce a given plan to “oh, this is just the Halting Problem again” I tend to abandon that line of work.
Past a certain point, one can simply never be adversarially robust in a programmatic and symbolically expressible way.
Humans would have to have non-Turing-Complete souls, and so would any hypothetical Corrigible Robot Saint/Slaves, in order to literally 100% prove that literally infinite computational power won’t find a way to make things horrible.
There is no such thing as a finitely expressible “Halt if Evil” algorithm...
...unless (I think?) all “agents” involved are definitely not Turing Complete and have no emotional attachments to any questions whose answers partake of the challenges of working with Turing Complete systems? And maybe someone other than me is somehow smart enough to write a model of “all the physics we care about” and “human souls” and “the AI” all in some dependently typed language that will only compile if the compiler can generate and verify a “proof that each program, and ALL programs interacting with each other, halt on all possible inputs”?
My hunch is that that effort will fail, over and over, forever, but I don’t have a good clean proof that it will fail.
Note that I’m pretty sure A and B are incompatible takes.
In “take A” I’m working from human subjectivity “down towards physics (through a vast stack of sociology and biology and so on)” and it just kinda seems like physics is safe to throw away because human souls and our humanistically normal concerns are probably mostly pretty “computational paltry” and merely about securing food, and safety, and having OK romantic lives?
In “take B” I’m starting with the material that mathematicians care about, and noticing that it means the project is doomed if the requirement is to have a mathematical proof about all mathematically expressible cares or concerns.
It would be… kinda funny, maybe, to end up believing “we can secure a Win Condition for the Normies (because take A is basically true), but True Mathematicians are doomed-and-blessed-at-the-same-time to eternal recursive yearning and Real Risk (because take B is also basically true)” <3
(C) Chaos is a thing! Even (and especially) in big equations, including the equations of mind that big stacks of adversarially optimized matrices represent!
This isn’t a “logically deep” point. I’m just vibing with your picture where you imagine that the “turbulent looking” thing is a metaphor for reality.
In observable practice, the boundary conditions of the equations of AI also look like fractally beautiful turbulence!
I predict that you will be surprised by this empirical result. Here is the “high church papering” of the result:
TITLE: The boundary of neural network trainability is fractal
Also, if you want to deep dive on some “half-assed peer review of this work” hacker news chatted with itself about this paper at length.
EDITED TO ADD: You respond “Lots of food for thought here, I’ve got some responses brewing but it might be a little bit” and I am happy to wait. Quality over speed is probably maybe still sorta correct. Timelines are compressing, but not so much that minutes matter… yet?
Alright, it’s been more than a “little” bit (new baby, haven’t had a ton of time), and not as complete of a reply as I was hoping to write, but
My impression is almost the opposite—physical reality seems not only to contain a finite amount of information and have a finite capacity for processing that information, but on top of that the “finite” in question seems surprisingly small. Specifically, the entropy of the observable universe seems to be in the ballpark of 10124 bits (c=3×108ms, rhorizon=4.4×1026m, so area is given by A=4πr2=2.4×1054m2. The Planck length is lp=1.6×10−35m and thus the Bekenstein–Hawking entropy in natural units is just SBH=A4=9.5×10123nats=1.3×10124bits). For context, the best estimate I’ve seen is that the total amount of data stored by humanity is in the ballpark of 1023 bits. If data storage increases exponentially, we’re a fifth of the way to “no more data storage capacity in the universe”. And similarly Landauer gives pretty tight bounds on computational capacity (I think something on the order of 10229 bit erasures as an upper bound if my math checks out).
So the numbers are large, but not “you couldn’t fit the number on a page if you tried to write it out” large.
If we’re limiting it to “big salient features” I expect the number of features one actually cares about in a normal situation is actually pretty small. Deciding which features are relevant, though, can be nontrivial.
I don’t disagree that a few billion if/then statements are likely sufficient. There’s a sense in which e.g. the MLP layers of an LLM are just doing a few tens of millions “if residual > value in this direction then write value in that direction, else noop” operations, which might be using clever-high-dimensional-space-tricks to emulate doing a few billion such operations, and they are able to answer questions about human values in a sane way, so it’s not like those values are intractable to express.
The rules for a policy that achieves arbitrarily good outcomes according to that relatively-compactly-specifiable value system, however, might not admit “a few billion well-chosen if/then statements”. For example it’s pretty easy to specify “earning block reward for the next bitcoin block would be good” but you can’t have a few billion if/then statements which will tell you what nonce to choose such that the hash of the resulting block has enough zeros to earn the block reward.
If I had to condense it down to one sentence it would be “the hard part of consequentialism is figuring out tractable rules which result in good outcomes when applied, not in figuring out which outcomes are even good”.
Agreed. I would go further than “not literally mathematically impossible” and further specify “and also not trying to find a fully optimal solution for an exponentially hard problem with large n”.
Beautifully put. I suspect that “certain point” is one of those “any points we actually care about are way past this point” things.
“The boundary of neural network trainability is fractal” was a significant part of my inspiration for writing the above. Honestly, if I ever get some nice contiguous more-than-1-hour chunks of time I’d like to write a post along the lines of “sometimes there are no neat joints separating clusters in thing-space” which emphasizes that point (my favorite concrete example is that “predict which complex root of a cubic or higher polynomial Euler’s method will converge to given a starting value” generates a fractal rather than the voronoi-diagram-looking-thing you would naively expect).
I think this is likely true. Though I think there are probably not that many True Mathematicians once the people who have sought and found comfort in math find that they can also seek and find comfort outside of math.
Don’t worry, the speed was low but the quality was on par with it :)
Lots of food for thought here, I’ve got some responses brewing but it might be a little bit.