I’m glad you shared this, it’s quite interesting. I don’t think I’ve ever had something like that happen to me and if it did I’d be concerned, but I could believe that it’s prevalent and normal for some people.
brambleboy
I don’t think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked “How many fingers am I holding behind my back?” The LLM would predict an answer like “three” or something, because an omniscient person would know that, even though it’s probably not true.
In other words, you’d want the system to believe “this writer I’m predicting knows exactly what I do, no more, no less”, not “this writer knows way more than me”. Read Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? for evidence of this.
What would work even better would be for the system to simply be Writing instead of Predicting What Someone Wrote, but nobody’s done that yet. (because it’s hard)
I’ve been trying to put all my long-form reading material in one place myself, and found a brand-new service called Reader which is designed specifically for this purpose. It has support for RSS, Newsletters, YouTube transcripts, and other stuff. $10 annually / $13 monthly.
Thanks for responding.
I agree with what you’re saying; I think you’d want to maintain your reward stream at least partially. However, the main point I’m trying to make is that in this hypothetical, it seems like you’d no longer be able to think of your reward stream as grounding out your values. Instead it’s the other way around: you’re using your values to dictate the reward stream. This happens in real life sometimes, when we try to make things we value more rewarding.
You’d end up keeping your values, I think, because your beliefs about what you value don’t go away, and your behaviors that put them into practice don’t immediately go away either, and through those your values are maintained (at least somewhat).
If you can still have values without reward signals that tell you about them, then doesn’t that mean your values are defined by more than just what the “screen” shows? That even if you could see and understand every part of someone’s reward system, you still wouldn’t know everything about their values?
This conception of values raises some interesting questions for me.
Here’s a thought experiment: imagine your brain loses all of its reward signals. You’re in a depression-like state where you no longer feel disgust, excitement, or anything. However, you’re given an advanced wireheading controller that lets you easily program rewards back into your brain. With some effort, you could approximately recreate your excitement when solving problems, disgust at the thought of eating bugs, and so on, or you could create brand-new responses. My questions:
What would you actually do in this situation? What “should” you do?
Does this cause the model of your values to break down? How can you treat your reward stream as evidence of anything if you made it? Is there anything to learn about the squirgle if you made the video of it?
My intuition says that life does not become pointless, now that you’re the author of your reward stream. This suggests the values might be fictional, but the reward signals aren’t the one true source—in the same way that Harry Potter could live on even if all the books were lost.
While I don’t have specifics either, my impression of ML research is that it’s a lot of work to get a novel idea working, even if the idea is simple. If you’re trying to implement your own idea, you’ll be banging your head against the wall for weeks or months wondering why your loss is worse than the baseline. If you try to replicate a promising-sounding paper, you’ll bang your head against the wall as your loss is worse than the baseline. It’s hard to tell if you made a subtle error in your implementation or if the idea simply doesn’t work for reasons you don’t understand because ML has little in the way of theoretical backing. Even when it works it won’t be optimized, so you need engineers to improve the performance and make it stable when training at scale. If you want to ship a working product quickly then it’s best to choose what’s tried and true.
At the start of my Ph.D. 6 months ago, I was generally wedded to writing “good code”. The kind of “good code” you learn in school and standard software engineering these days: object oriented, DRY, extensible, well-commented, and unit tested.
I think you’d like Casey Muratori’s advice. He’s a software dev who argues that “clean code” as taught is actually bad, and that the way to write good code efficiently is more like the way you did it intuitively before you were taught OOP and stuff. He advises “Semantic Compression” instead- essentially you just straightforwardly write code that works, then pull out and reuse the parts that get repeated.
Yeah, I think the mainstream view of activism is something like “Activism is important, of course. See the Civil Rights and Suffrage movements. My favorite celebrity is an activist for saving the whales! I just don’t like those mean crazy ones I see on the news.”
Pacing is a common stimming behavior. Stimming is associated with autism / sensory processing disorder, but neurotypical people do it too.
This seems too strict to me, because it says that humans aren’t generally intelligent, and that a system isn’t AGI if it’s not a world-class underwater basket weaver. I’d call that weak ASI.
Fatebook has worked nicely for me so far, and I think it’d be cool to use it more throughout the day. Some features I’d like to see:
Currently tags seem to only be useful for filtering your track record. I’d like to be able to filter the forecast list by tag.
Allow clicking and dragging the bar to modify probabilities.
An option to input probabilities in formats besides percentages, such as odds ratios or bits.
An option to resolve by a specific time, not just a date, plus an option for push notification reminders instead of emails. This would open the door to super short-term forecasts like “Will I solve this problem in the next hour?”. I’ve made a substitute for this feature by making reminders in Google Keep with a link to the prediction.
When I see an event with the stated purpose of opposing highly politically polarized things such as cancel culture and safe spaces, I imagine a bunch of people with shared politics repeating their beliefs to each other and snickering, and any beliefs that are actually highly controversial within that group are met with “No no, that’s what they want you to think, you missed the point!” It seems possible to avoid that failure mode with a genuine truth-seeking culture, so I hope you succeeded.
It’s been about 4 years. How do you feel about this now?
Bluesky has custom feeds that can bring in posts from all platforms that use the AT Protocol, but Bluesky is the only such platform right now. Most feeds I’ve found so far are simple keyword searches, which work nicely for having communities around certain topics, but I hope to see more sophisticated ones pop up.
While the broader message might be good, the study the video is about didn’t replicate.
While most people have super flimsy defenses of meat-eating, that doesn’t mean everyone does. Some people simply think it’s quite unlikely that non-human animals are sentient (besides primates, maybe). For example, IIRC Eliezer Yudkowsky and Rob Bensinger’s guess is that consciousness is highly contingent on factors such as general intelligence and sociality, or something like that.
I think the “5% chance is still too much” argument is convincing, but it begs similar questions such as “Are you really so confident that fetuses aren’t sentient? How could you be so sure?”
I agree that origami AIs would still be intelligent if implementing the same computations. I was trying to point at LLMs potentially being ‘sphexish’: having behaviors made of baked if-then patterns linked together that superficially resemble ones designed on-the-fly for a purpose. I think this is related to what the “heuristic hypothesis” is getting at.
The paper “Auto-Regressive Next-Token Predictors are Universal Learners” made me a little more skeptical of attributing general reasoning ability to LLMs. They show that even linear predictive models, basically just linear regression, can technically perform any algorithm when used autoregressively like with chain-of-thought. The results aren’t that mind-blowing but it made me wonder whether performing certain algorithms correctly with a scratchpad is as much evidence of intelligence as I thought.
Claude finally made it to Cerulean after the “Critique Claude” component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)