JenniferRM comments on Values Are Real Like Harry Potter

JenniferRM 13 Oct 2024 18:29 UTC
5 points
0
I hit ^f and searched for “author” and didn’t find anything, and this is… kind of surprising.
For me, nothing about Harry Potter’s physical existence as a recurring motif in patterns of data inscribed on physical media in the physical world makes sense without positing a physically existent author (and in Harry’s case a large collection of co-authors who did variational co-authoring in a bunch of fics).
Then I can do a similar kind of “obtuse intest in the physical media where the data is found” when I think about artificial rewards signals in digital people… in nearly all AIs, there is CODE that implements reinforcement learning signals...
...possibly ab initio, in programs where the weights, and the “game world”, and the RL schedule for learning weights by playing in the game world were all written at the same time...
...possibly via transduction of real measurements (along with some sifting, averaging, or weighting?) such that the RL-style change in the AI’s weights can only be fully predicted by not only knowing the RL schedule, but also by knowing about whatever more-distant-thing as being measured such as to predict the measurements in advance.
The code that implements the value changes during the learning regime, as the weights converge on the ideal is “the author of the weights” in some sense...
...and then of course almost all code has human authors who physically exist. And of course, with all concerns of authorship we run into issues like authorial intent and skill!
It is natural, at this juncture to point out that “the ‘author’ of the conscious human experience of pain, pleasure, value shifts while we sleep, and so on (as well as the ‘author’ of the signals fed to this conscious process from sub-conscious processes that generate sensoria, or that sample pain sensors, to create a subjective pain qualia to feed to the active self model, and so on)” is the entire human nervous system as a whole system.
And the entire brain as a whole system is primarily authored by the human genome.
And the human genome is primarily authored by the history of human evolution.
So like… One hypothesis I have is that you’re purposefully avoiding “being Pearlian enough about the Causes of various Things” for the sake of writing a sequence with bite-sized chunks, than can feel like they build on each other, with the final correct essay and the full theory offered only at the end, with links back to all the initial essays with key ideas?
But maybe you guys just really really don’t want to be forced down the Darwinian sinkhole, into a bleak philosophic position where everything we love and care about turns out to have been constructed by Nature Red In Tooth And Claw and so you’re yearning for some kind of platonistic escape hatch?
I definitely sympathize with that yearning!
Another hypothesis is that you’re trying to avoid “invoking intent in an author” because that will be philosophically confusing to most of the audience, because it explains a “mechanism with ought-powers” via a pre-existing “mechanism with ought-powers” which then cannot (presumably?) produce a close-ended “theory of ought-powers” which can start from nothing and explain how they work from scratch in a non-circularly way?
Personally, I think it is OK to go “from ought to ought to ought” in a good explanation, so long as there are other parts to the explanation.… So minimally, you would need two parts, that work sort of like a proof by induction. Maybe?
First, you would explain how something like “moral biogenesis” could occur in a very very very simple way. Some catholic philosophers, call this “minimal unit” of moral faculty “the spark of conscience” and a technical term that sometimes comes up is “synderesis”.
Then, to get the full explanation, and “complete the inductive proof” the theorist would explain how any generic moral agent with the capacity for moral growth could go through some kind of learning step (possibly experiencing flavors of emotional feedback on the way) and end up better morally calibrated at the end.
Together the two parts of the theory could explain how even a small, simple, mostly venal, mostly stupid agent with a mere scintilla of moral development, and some minimal bootstrap logic, could grow over time towards something predictably and coherently Good.
(Epistemics can start and proceed analogously… The “epistemic equivalent of synderesis” would be something like a “uniform bayesian prior” and the “epistemic equivalent of moral growth” would be something like “bayesian updating”.)
Whether the overall form of the Good here is uniquely convergent for all agents is not clear.
It would probably depend at least somewhat on the details of the bootstrap logic, and the details of the starting agent, and the circumstances in which development occurs? Like… surely in epistemics you can give an agent a “cursed prior” to make it unable to update epistmically towards a real truth via only bayesian updates? (Likewise I would expect at least some bad axiological states, or environmental setups, to be possible to construct if you wanted to make a hypothetically cursed agent as a mental test of the theory.)
So...
The best test case I could come up with for separately out various “metaphysical and ontology issues” around your “theory of Thingness” as it relates to abstract data structures (including ultimately perhaps The Algorithm of Goodness (if such a thing even exists)) was this smaller, simpler, less morally loaded, test case...
(Sauce is figure 4 from this paper.)
Granting that the Thingness Of Most Things rests in the sort of mostly-static brute physicality of objects...
...then noticing and trying to deal with a large collection of tricky cases lurking in “representationally stable motifs that seem thinglike despite not being very Physical” that almost all have Physical Authors...
...would you say that the Lorenz Attractor (pictured above) is a Thing?
If it is a Thing, is it a thing similar to Harry Potter?
And do you think this possible-thing has zero, one, or many Authors?
If it has non-zero Authors… who are the Authors? Especially: who was the first Author?
- johnswentworth 14 Oct 2024 6:06 UTC
  2 points
  0
  Parent
  First, note that “the Harry Potter in JK Rowling’s head” and “the Harry Potter in the books” can be different. For novels we usually expect those differences to be relatively small, but for a case like evolution authoring a genome authoring a brain authoring values, the difference is probably much more substantial. Then there’s a degree of freedom around which thing we want to talk about, and (I claim) when we talk about “human values” we’re talking about the one embedded in the reward stream, not e.g. the thing which evolution “intended”. So that’s why we didn’t talk about authors in this post: insofar as evolution “intended to write something different”, my values are the things it actually did write, not the things it “intended”.
  (Note: if you’re in the habit of thinking about symbol grounding via the teleosemantic story which is standard in philosophy—i.e. symbol-meaning grounds out in what the symbol was optimized for in the ancestral environment—then that previous paragraph may sound very confusing and/or incoherent. Roughly speaking, the standard teleosemantic story does not allow for a difference between the Harry Potter in JK Rowling’s head vs the Harry Potter in the books: insofar as the words in the books were optimized to represent the Harry Potter in JK Rowling’s head, their true semantic meaning is the Harry Potter in JK Rowling’s head, and there is no separate “Harry Potter in the books” which they represent. I view this as a shortcoming of teleosemantics, and discuss an IMO importantly better way to handle teleology (and implicitly semantics) here: rather than “a thing’s purpose is whatever it was optimized for, grounding out in evolutionary optimization”, I say roughly “a thing’s purpose is whatever the thing can be best compressed by modeling it as having been optimized for”.)
  ...would you say that the Lorenz Attractor (pictured above) is a Thing?
  If it is a Thing, is it a thing similar to Harry Potter?
  And do you think this possible-thing has zero, one, or many Authors?
  Off-the-cuff take: yes it’s a thing. An awful lot of different “authors” have created symbolic representations of that particular thing. But unlike Harry Potter, that particular thing does represent some real-world systems—e.g. I’m pretty sure people have implemented the Lorenz attractor in simple analogue circuits before, and probably there are some physical systems which happen to instantiate it.
  Like… surely in epistemics you can give an agent a “cursed prior” to make it unable to update epistmically towards a real truth via only bayesian updates?
  Yup, anti-inductive agent.