Had some illusions about the C language shattered recently.
I read an article from 2018 called C Is Not A Low Level Language from ACM Queue. The long and short of it is that C fails to get the programmer “close to the metal” in any meaningful sense because the abstract machine it uses doesn’t faithfully represent anything about modern computer architecture. Instead, it hides the complexity of modern instruction sets and memory arrangements beneath the abstractions which modeled hardware very well in the 1970s.
I, like most people, thought C was the best way to do hardware aside from just writing in assembly or machine code directly. I had assumed, but never checked, that as hardware advanced the work on C development was accounting for this; it appears backwards compatibility won out. This puts us in a weird position where both new hardware design and new C development are both constrained by trying to maintain compatibility with older C. In the case of hardware design, this means limiting the instruction sets of processors so they are comprehensible to C; in the case of C development this means an unyielding commitment to making people write code for the PDP-11 and relying ever more heavily on the compiler to do the real work.
The comments in the Reddit thread were, predictably, overwhelmingly focused on semantics of the high/low level dichotomy, with half claiming Assembler is also a high level language and the other half flatly rejecting the premise of the article or playing defense about how useful C is. I feel this kind of thing misses the point, because what I liked about C is that it helped me to think about what the machine was actually doing. Now I discover I wasn’t thinking about what the machine I was working on was doing so much as thinking about general types of things a machine is expected to do (have one processor, and memory, and disk, and IO). While this is clearly better than not knowing, it left what I thought was the core advantage in the wind.
I therefore did a search, assuming that if old faithful had failed someone else had surely tried to fill the niche. Nothing presented itself in my searches, which combined represent an hour or so of reading. Instead I learned something interesting about programming; it is entirely upward oriented. All the descriptions of every language billed as being either for embedded systems or as “one language to rule them all” advertised themselves entirely of what kind of high-level abstractions they gave access to.
Hence the old saw: dogs cannot look up; computer scientists cannot look down.
Reflecting on the failure of another embedded language to provide exactly what I wanted—a way to reason about what the machine was doing—I looked a little further afield. I know of two candidates for thinking about this problem from different directions.
The first is Verilog (now SystemVerilog), which is a Hardware Description Language (HDL). This does pretty much exactly what I want in terms of reasoning about the machine, but it is for design purposes: you describe how the hardware works, then verify the description does what you want it to, and then eventually it is instantiated in actual, physical hardware. I am not sure how or even if it could be used on existing hardware to learn things about the hardware, or to optimize tasks.
The second is a thing called the Legion Programming System, out of Stanford. This comes from the other end of the scale, targeting High Performance Computing (HPC) applications. It is not a programming language per se; rather it is a model for application development. It identifies the exact same concerns I have about taking into account modern (and future) computing architecture, but it focuses on supercomputers and huge clusters of servers.
So the first option is still looking up the ladder of abstraction, just from beneath where I want to be; the second option mostly looks at high-scale hardware from the side. I suppose the thing that would make me happiest is something like a Legion Programming System but for robotics instead of HPC. The thing I would most be able to take advantage of personally is basically just a C fork with a new abstract machine and compiler which better accounts for modern architecture. Given how much work “basically” is doing, this amounts to a completely different language that uses a C-like syntax, and it seems unlikely if no one is making it already.
My wife’s iPhone put together one of its Over the Years slideshows for my birthday, and she sent it to me because—exactly as the designers hoped—it contains several good memories. I have two wonderings about this.
First, the goodness in good memories. We periodically talk about the lowest-cost, best-return psychological intervention of gratitude, which seems heavily wrapped up with the idea of deliberately reviewing good memories. I know nothing about the algorithm that is used to put together these slideshows, but it seems to me it mostly triggers on various holidays and other calendar events. I wonder if there is a way to optimize this in the direction of maximizing gratitude, perhaps as a user-controlled feature. I wonder what maximizing gratitude actually looks like, in the sense of provoking the feeling as often as possible. Is there anything like an emotional-trigger spaced repetition system, and is there any kind of emotional-experience curve distinct from the forgetting curve?
Second, speaking of the spaced repetition system, is the memory in good memories. I have traditionally been almost totally indifferent to pictures and actively hostile to the taking of them; I have seen a lot of experiences skipped or ruined outright by the insistence on recording them for the memories. I still enjoy these sorts of slideshows when they come up, which amounts to me free riding on the labor of chronicling our lives. Sometimes they capture moments I have forgotten, which suggests there is some value in deliberately reviewing one’s life.
A stark case is Alzheimers. The classic horror is the forgetting of loved ones; if we took something like the Over the Years slideshows and put them on an SRS schedule, would people suffering from Alzheimers be able to remember their loved ones for longer? Can we even model Alzheimers as something like a faster forgetting curve?
Expanding the idea beyond treatment into a tool, what about for networking either professionally or socially? There already seems to be an abundant supply of photos and videos with brief acquaintances. Would anyone be interested in hijacking the slideshows for deliberately remembering the names of people they met at a party or professional convention?
A skilled hacker could install themselves in your memories. Add their face to your photos, first as one of many people in the background, then gradually more and more, until you would start believing they were the best friend in your childhood. Each time you review your memories, you create fake memories of your “friend”.
Also, consider the possibilities this would offer for advertising. I bet you didn’t notice that each of your best memories involves you holding a bottle of Coke. But the more you review the photos, the more obvious it becomes.
This should also be transformed into a horror movie in the direction of Ring and Insidious, where each time we view the photos the malevolent spirit gets more integrated into our lives and when it shows up in a best friends forever photo alone with us it gets to assume our lives.
Architecture: neural nets are a family of architectures. Transformers are an example of a specific architecture.
Meta-algorithm: the arrangement of and training procedure for the stack of transforms; this seems to be what people are referring to when they say algorithmic improvements. I think of it this way because it is an algorithm-finding-algorithm.
Algorithm: the learned algorithm found by the meta-algorithm, being used to handle inputs in production. There is high uncertainty about what exactly these are, since we can only easily see the weights/activations.
Algorithmic uncertainty: by default we don’t know what the algorithm is in any detail. Mechanistic Interpretability is about reducing algorithmic uncertainty.
Scale compute. This means stuff like GPUs, TPUs, total number of flops, etc.
Scale data. Normally this means unstructured; I don’t actually have any sense of what scaling structured data looks like other than it really seemed to help AlphaFold.
Scale models. This means increasing the number of parameters which are trained, which seems to correlate with compute but strictly speaking should be independent of it.
But none of the conversations seem to match these divisions, which means at least I am out of sync with everyone else, and probably that I am wrong in several dimensions.
I think social media has brought us to a point in the development of communication where it could now be considered communication pollution. I model this like light:
In short, communication is so convenient there is no incentive to make it good. Since there is no incentive to make it good, the bad (by which I mean low-quality) radically outnumbers the good, and eats up the available attention-bandwidth.
Now light is so cheap it is essentially free, and so pervasive it causes harm.
At the same time most people underconsume light. Both in the amount of lumen they have available and in the ability to set light color differently at different time of the day.
This appears to be the strategy the organizations dedicated to darkening are pursuing.
Popular example: many streetlights throw light directly into the sky or sideways. They try to advocate for choosing the more expensive streetlights that effectively direct the light to the walking areas people actually use. The net result is usually a better-illuminated sidewalk.
I’ve seen research and demo-houses that employ the different light over the course of the day approach, but I have not seen anything about trying to get developers to offer it. In this way it falls into the same gap as any given environmental improvement in home/apartment building; people would probably by it if it were available, but have don’t have the option to select it because it isn’t; there’s no real way to express demand to developers short of people choosing to do independent design/builds.
I feel like this should also be handled by a contractor, but there’s not much in the way of ‘lighting contractors’ to appeal to; it seems like an electrician would have to shift their focus, or an interior decorator would have to expand into more contractor-y work.
I’ve seen research and demo-houses that employ the different light over the course of the day approach, but I have not seen anything about trying to get developers to offer it.
The technology is commerically sold with Philips Hue (and a few lesser known brands). It’s just a matter of putting different lightbulbs into the fixtures for light bulbs and setting up when you want what light.
My room is red at the end of the day and my screen is redish via f-lux.
Yeah. Digital communication removes costs such as paying for paper etc. But after all these costs are gone, two remain:
the time and work it cost you to produce the information;
the time and work it costs other people to select the information they want.
Here the relation is a bit adversarial: the delivery of the right information to the right reader is a cost that the writer and the readers most split somehow.
The writer may pay the cost by thinking twice whether something should be written at all, by doing some research and writing it well, perhaps by tagging it somehow so the readers can easily (preferably automatically) decide whether they are interested. Or the writer may simply throw around many scraps of content, of random topic, length, and quality, and let the readers sort it out.
This is partially about the writers (choosing to write about something, doing your homework first) and partially about the publishing systems (do they support tagging and filtering by tags, preferentially following your favorite writers, is there a possibility to mark something as “important” and a punishment for abusing this flag). Maybe the systems are frustrating because designing a non-frustrating system is extremely difficult. Or maybe the systems are frustrating because anything more complicated than “throw around scraps of content” would be too complicated to use for most users.
And the advertising! How could I expect companies to give me only the content I am interested in and filter out everything annoying, if “giving me the annoying things” is precisely their business model? So not only are good systems difficult to design and unlikely to become popular, there is even smaller incentive to create them.
How could I expect companies to give me only the content I am interested in and filter out everything annoying, if “giving me the annoying things” is precisely their business model? So not only are good systems difficult to design and unlikely to become popular, there is even smaller incentive to create them.
I don’t think that either Google or facebook as well described as “giving you the annoying things”. Both companies invest into giving people ads that they might engage with.
This happens both because they want to keep their users and don’t push them away and because the people who appreciate a given ad are most likely to engage with it.
If I search for a new microwave on Google and then get a bunch of new microwave ads everywhere I browse I’m likely getting those at the time of my life where I’m least bothered by them given that I actually want to buy a microwave.
If I search for a new microwave on Google and then get a bunch of new microwave ads everywhere I browse I’m likely getting those at the time of my life where I’m least bothered by them given that I actually want to buy a microwave.
When I buy a microwave, I typically open the websites of the few shops I trust, search for the products, compare their parameters, and read the reviews. I don’t really understand what extra value I get from Google showing me more microwave ads when I am trying to listen to some music on YouTube afterwards. My priors for Google showing me a better microwave are quite low. (But this is all hypothetical, because Google will show me the ad for Grammarly regardless.)
This happens both because they want to keep their users and don’t push them away and because the people who appreciate a given ad are most likely to engage with it.
My knowledge may be obsolete, but the last time I checked, there were essentially two models: “pay per view” and “pay per click”. Given that I am the kind of user who almost never clicks on ads, it makes more sense to show me the “per per view” ones, right?
But then, the income from ads shown to me is proportional to the number of ads shown to me. So the optimal amount of ads to show me is… smaller than the amount that would make me stop using the website… but not much smaller. Did I get my math wrong somewhere?
I was trying out a different intuition for articulating the problem of AGI, and thought I would ask ChatGPT about it. I am curious how GPT4 would respond differently, and may ask some of the other LLMs about it.
Imagine a general optimizer function which optimizes the value of any function (or functions) passed to it. We pass a set of functions, A, to the general optimizer function. What happens to the value of all other functions, ~A?
The behavior of the general optimizer function towards functions outside of the set A would depend on the implementation of the function and the optimization algorithm used.
In general, the optimizer function would focus on optimizing the value of the functions in set A, and may not directly consider the value of the other functions. However, the optimization process could indirectly impact the values of other functions, especially if the functions in set A are dependent on the values of other functions.
For example, if the functions in set A are used as inputs to other functions, then optimizing the value of the functions in set A could indirectly impact the values of these other functions. Additionally, some optimization algorithms may explore the space of function values beyond the set A, in which case the values of functions outside of A could also be impacted.
In summary, the behavior of the general optimizer function towards functions outside of the set A would depend on the implementation and specific circumstances, but in general, the optimizer would focus on optimizing the values of the functions in set A.
If there are finite resources, and every function requires resources, then what happens to the functions outside of the set A?
If there are finite resources and every function requires resources, then the resources allocated to optimize the functions in set A would be taken away from other functions. This could result in several outcomes for functions outside of set A:
Some functions outside of set A may continue to operate as usual, as they do not depend on the resources that are being allocated to optimize the functions in set A.
Some functions outside of set A may experience reduced performance or functionality, as resources are redirected towards optimizing the functions in set A.
Some functions outside of set A may fail completely if they are highly dependent on resources that are being allocated to optimize the functions in set A.
The specific outcome for each function outside of set A would depend on the relative importance of the function, its resource requirements, and the resources being allocated to optimize the functions in set A.
Therefore, in situations where there are finite resources and every function requires resources, it is important to carefully consider the trade-offs between optimizing a set of functions and the impact on other functions that are not part of that set. It may be necessary to prioritize certain functions over others, or to optimize multiple sets of functions in a phased approach to minimize the impact on other functions.
If the general optimizer function is among the functions in set A, and there are finite resources, and every function requires resources, then what happens to the functions not in A?
If the general optimizer function is included in the set of functions A, then optimizing the functions in set A would also include optimizing the general optimizer function itself.
If there are finite resources and every function requires resources, then the resources allocated to optimize the functions in set A, including the general optimizer function, would be taken away from other functions not in set A. This could lead to the outcomes discussed in my previous response, where the functions outside of set A may experience reduced performance or fail completely if they are highly dependent on resources that are being allocated to optimize the functions in set A.
In addition, if the general optimizer function is not optimized effectively, then it may not be able to optimize the other functions in set A as well as it could. This could result in suboptimal solutions or increased resource consumption for the optimization process.
Therefore, if the general optimizer function is included in the set of functions A, it is important to ensure that the resources allocated to optimizing the general optimizer function are carefully balanced with the resources allocated to optimizing the other functions in set A, as well as any other functions outside of set A that are important to the overall system. This may involve optimizing the general optimizer function separately, or using a phased approach to optimize different sets of functions.
I think we do specialization of labor wrong a lot. This makes me think there is a lot to be gained from a sort of back-to-basics approach, and I feel like knowledge work in general and research in particular are good candidates.
The classic example of specialization of labor is comparing the output of a single blacksmith in the production of pins with 1⁄10 the output of a factory where 10 men work producing pins, but each is responsible for 1-3 tasks in the pin-making process. Each man gets very good at his appointed tasks, and as a consequence the per-worker output is 4800 pins per day, and the lone metalworker’s output is perhaps 20 pins per day.
Imagine for a moment how research is done: there is a Primary Investigator who runs a lab, who works with grad students and postdocs. Normally when the PI has a research project, he will assign the work to his different assistants with so-and-so gathering the samples, and so-and-so running them through the equipment, and so-and-so preparing some preliminary analysis. In this way the labor is divided; Adam Smith called it division of labor; why don’t I think it counts?
Because they aren’t specializing. The pitch with the pin factory is that each worker gets very good at their 1-3 pin-making tasks. The division of labor is ad-hoc in the research case; in fact the odds are good that the opportunity to get very good is effectively avoided, because each assistant is expected to be competent in each of the relevant tasks.
This is because a scientist is a blacksmith-of-abstractions, and the grad students and postdocs are the apprentices. The point is only mostly to produce research; the other point is to turn the assistants into future PIs themselves.
This is how “artisanal” small research labs work, but larger research groups usually have more specialisation—especially in industry, which happens to have almost all the large-scale research groups. People might specialise in statistics and data analysis, experimental design, lab work, research software engineering, etc. Bell Labs did not operate on the PI model, for example; nor does DARPA or Google or …
I’d be interested in hearing more about how Google runs things, which I have no knowledge of; I note that Bell Labs and DARPA are the go-to examples of standout production, and one of them is defunct now. This also scans with (what I understand to be) the usual economic finding that large firms produce more innovation on average than small ones.
I have other examples from industry I’ve been thinking about where the problem seems to be dividing labor at the wrong level (or maybe not applying it completely at every level?). The examples I have in mind here are pharmaceutical companies, who underwent a long series of mergers starting in the 80s.
The interesting detail here is that they did do specialization very similar to the pin example, but they didn’t really account for research being a much broader problem than pin-making. Prior to the big downsizing of labs in pharma and chemicals, the story went that all these labs during mergers had been glommed together and ideas would just be put in one end and then either a working product came out the other end or not; there was no feedback loop at the level of the product or idea.
This looks to me like a case where small university labs are only specialized at the meta level, and the failing industrial labs are only specialized at the object level. It feels to me like if there were a graceful way to describe the notion that specialization has dimensionality, we’d be able to innovate better.
Does anyone know of a good tool for animating 3d shapes? I have a notion for trying to visualize changes in capability by doing something like the following:
A cube is grows along the z axis in according to some data
When a particular threshold is reached, a new cube branches off from that threshold, and extends in the x or y axis (away from the “trunk”)
When a particular threshold is reached on that cube, the process repeats
In this way there would be kind of a twisted tree, and I could deploy our intuitions about trees as a way to drive the impact. I thought there would be something like a D3 for these kinds of manipulations, but I haven’t found it yet.
There is http://www.povray.org/ which renders 3D pictures and animations similarly to Blender, except that instead of having an editor, you describe the shapes by equations.
Like, you can specify that a cube is at some coordinates, scaled and rotated depending on the time variable, and conditional on time being greater than some value, another cube exists, etc.
If the animation you want to make is simple from the mathematical perspective, you could either write the code that animates the shapes, or alternatively you could write a script in your favorite language that would animate the shapes and output recipes for rendering static pictures for individual frames.
Is spaced repetition software a good tool for skill development or good practice reinforcement?
I was recently considering using an Anki prompt to do a mental move rather than to test my recall, like tense your muscles as though you were performing a deadlift. I don’t actually have access to a gym right now, so I didn’t get to put it into action immediately. Visualizing the movement as vividly as possible, and tensing muscles like the movement was being performed (even when not doing it) are common tricks reported by famous weightlifters.
But I happened across an article from Runner’s World today which described an experiment where all they did was tell a group of runners the obvious things that everyone already knows about preventing injury. The experimental group saw ~13% fewer injuries.
This suggests to me that my earlier idea is probably a good one, even though it isn’t memory per se. The obvious hitch is that what I am after isn’t actually recall—it isn’t as though runners forget that overtraining leads to injury if you were to ask them, and I have never forgotten how to do a deadlift.
Rather the question is how to make it as correct and instinctive as possible.
This feels like a physical analogue of my earlier notion about Ankifying the elements of a problem, so as to integrate it into my perspective and notice relevant information.
Maybe a better way to say this is using an Anki prompt to help respond to a physical prompt, that being the task itself.
A physical action in response to the physical task instinctively already has a name; it is called muscle memory.
Personally I’ve found the biggest problem with spaced repetition for skills and habits is that it’s contextless.
Adding the context from multiple skills with different contexts makes it take way more time, and not having the context makes it next to useless for learning the skils.
Personally I’ve found the biggest problem with spaced repetition for skills and habits is that it’s contextless.
Could you talk a bit more about this? My initial reaction is that I am almost exactly proposing additional value from using Anki to engage the skill sans context (in addition to whatever actual practice is happening with context).
I review Gwern’s post pretty much every time I resume the habit; it doesn’t look like it has been evaluated in connection with physical skills.
I suspect the likeliest difference is that the recall curve is going to be different from the practice curve for physical skills, and the curve for mental review of physical skills will probably be different again. These should be trivial to adjust if we knew what they were, but alas, I do not.
Maybe I could pillage the sports performance research? Surely they do something like this.
My take is pretty similar to cognitive skills: It works well for simple motor skills but not as well for complex skills.
My initial reaction is that I am almost exactly proposing additional value from using Anki to engage the skill sans context (in addition to whatever actual practice is happening with context).
My experience is basically that this doesn’t work. This seems to track with the research on skill transfer (which is almost always non-existent or has such a small effect that it can’t be measured.)
I have some cards in my Anki collection that ask me to review dance moves, and I have found that it is helpful for making sure my body remembers how to do them
I wonder how hard it would be to design a cost+confidence widget that would be easily compatible (for liberal values of easy) with spreadsheets.
I’m reading a Bloomberg piece about Boeing which quotes employees talking about the MAX as being largely a problem of choosing the lowest bidder. This is also a notorious problem in other places where there are rules which specify using the lowest cost contractor, like parts of California and many federal procurement programs. It’s a pretty widespread complaint.
It would also be completely crazy for it to be any other way, for what feels like a simple reason: no one knows anything except the quoted price. There’s no way to communicate confidence simply or easily. The lowest bid is easily transmitted through any spreadsheet, word document, or accounting software, quality of work be damned. Any attempt to even evaluate the work is a complicated one-off report, usually with poorly considered graphics, which isn’t even included with the budget information so many decision makers are unlikely to ever see it.
So, a cost+confidence widget. On the simple end I suppose it could just be a re-calculation using a reference class forecast. But if I’m honest what I really want is something like a spreadsheet where each cell has both a numerical and graphical value, so I could hover my mouse over the number to see the confidence graph, or switch views to see the confidence graphs instead of the values.
Then if we’re getting really ambitious, something like a time+confidence widget would also be awesome. Then when the time+confidence and cost+confidence values are all multiplied together, the output is a like a heat map of the two values, showing the distribution of project outcomes overall.
Are math proofs useful at all for writing better algorithms? I saw on Reddit recently that they proved Batchelor’s Law in 3D, the core idea of which seems to be using stochastic assumptions to prove it cannot be violated. The Quanta article does not seem to contain a link to the paper, which is weird.
Batchelor’s Law is the experimentally-observed fact that turbulence occurs at a specific ratio across scales, which is to say when you zoom in on a small chunk of the turbulence it looks remarkably like all of the turbulence, and so on. Something something fractals something.
Looking up the relationship between proofs and algorithms mostly goes to proofs about specific algorithms, and sometimes using algorithms as a form of proof; but what I am after is whether a pure-math proof like the above one can be mined for useful information about how to build an algorithm in the first place. I have read elsewhere that algorithmic efficiency is about problem information, and this makes intuitive sense to me; but what kind of information am I really getting out of mathematical proofs, assuming I can understand them?
I don’t suppose there’s a list somewhere that handily matches tricks for proving things in mathematics to tricks for constructing algorithms in computer science?
A proof may show that an algorithm works. If the proof is correct*, this may demonstrate that the algorithm is robust. (Though you really want a proof about an implementation of the algorithm, which is a program.)
*A proof that a service will never go down which relies on assumptions with the implication “there are no extreme solar storms” may not be a sufficient safeguard against the possibility that the service will go down if there is an extreme solar storm. Less extremely, perhaps low latency might be proved to hold, as long as the internet doesn’t go down.
How are algorithms made, and how can proofs improve/be incorporated into that process?
Given a problem, you can try and solve it (1). You can guess(2). You can try (one or more) different things and just see if they work(3).
1 and 2 can come apart, and that’s where checking becomes essential. A proof that the method you’re using goes anywhere (fast) can be useful there.
Let’s take a task:
Sorting. It can be solved by:
1. Taking a smaller instance, solving that (and paying attention to process). Then extract the process and see how well it generalizes
2. Handle the problem itself
3. Do something. See if it worked.
2 and 3 can come apart:
At its worst, 3 can look like Bogosort. Thought that process can be improved. Look at the first two elements. Are they sorted? No: shuffle them. Look at the next two elements...
4! = 12, twelve permutations of 4 elements. The sorting so far has eliminated some possibilities:
1, 2, 3, 4
1, 3, 2, 4
1, 4, 2, 3
2, 3, 1, 4
2, 4, 1, 3
Now all that’s needed is a method of shuffling that doesn’t make things less orderly… And eventually Mergesort may be invented.
In the extreme, 3 may be ‘automated’:
programs write programs, and test them to see if they do what’s needed (or a tester gets a guesser thrown at it, to ‘crack the password’)
The post you linked to (algorithmic efficiency is about problem information) - the knowledge that method X works best when conditions Y are met, which is used in a polyalgorithmic approach? That knowledge might come from proofs.
How many kinds of environmental and/or building permits could be gotten by company A, and then used for company B’s project, after company B buys company A? For example, could we form a special purpose company to pursue navigating all the NEPA stuff for building a power plant at a particular site, and then sell that company to a power company, who could then begin construction immediately?
Consider real estate developers: one company buys the land and develops it, which means do all the construction, and then sells it to other companies who use the buildings.
I propose an extension of this model, where one company will buy the land and do the legal/regulatory work, and then sell the site+permits to another company who will do the construction of the facility.
The advantage here is that rather than one company taking the whole risk of the land and permitting process through to completion of the needed facility, the land and permitting can be handled by a special purpose company which can be sold, construction-ready, to a market of potential buyers. Because there are multiple potential buyers, the risks are lower than they are for one company doing the whole process alone, and normally you can’t sell permits/licenses/etc once they are granted so if the project falls through for some reason the work done on the regulatory stuff is wasted.
If widely adopted as a practice, what I hope this would allow is a lot of potential building sites for needed infrastructure can be prepared for construction simultaneously. While this will do nothing to speed up projects starting now, it would potentially cut years off the construction time of the next generation of facilities as an abundance of sites would be waiting for them already.
Pays out faster than regular real estate development on a within-sector basis.
Here I envision industrial/infrastructure uses, but for areas that are sufficiently difficult (New York or California, for example) it could be viable for things like housing in combination with traditional development (permit company → development company → homebuyer).
You almost definitely cannot get “general construction” permits whereby you can build anything, which on the one hand is limiting and requires strategic choices in permitting.
On the other hand it allows some strategic influence over the development of useful sites.
I wonder how feasible it would be to make something that would allow people to coordinate for consumption like unions for bargaining, parties for politics, or corporations for production. In particular I have in mind targeting entertainment and social media consumption because of the prevalence of recommendation algorithms here.
The intuition is that these algorithms are political computations: who gets the advertising traffic, what type of content is in demand, which consumers are catered to.
The thing being coordinated on is the consumer behavior; namely what people click on.
Examples:
like a union, but instead of collective bargaining to sell labor, it is collective behavioring[1] to consume content.
like the old website Massdrop, but instead of coordinated buying it is coordinated clicking.
like a political party, but instead of coordinating votes, it coordinates clicks.
like a company, but instead of coordinating work, it coordinates consumption.
Concretely, I have in mind something like a browser widget that overlays indicators of the most impactful consumption choice to make to influence the algorithm in the direction you want, such as an icon or a color filter. This would run over things like youtube, instagram, tiktok, etc.
Another way to think of this: a value handshake between whatever algorithm the company is using and an algorithm you would choose instead.
behavioring—by this I mean consciously conforming to a pattern of behavior so it shows up in behavior analysis, in this case consumer behavior tracking which is part of the function of recommendation algorithms.
I read a story once about a billionaire who managed to flout a zoning limitation by building himself an absurdly huge high-rise single apartment. Once that was done, he waited a bit, and either using a loophole that already existed or one he finagled into the law, he added a bunch of walls inside is personal mega-apartment to and then sold off all these new units as regular apartments.
Now I think of this in the context of the American dredging and shipping laws. There’s a lot of rules surrounding dredges and shipping vessels, but there are a lot fewer surrounding luxury items like yachts. So imagine a group of investors building a big, state of the art shipyard that builds yachts, but which can be converted for little or no additional cost to build dredges or vessels suitable for river shipping or similar.
A simpler alternative scheme is to just comply with the laws but use yachts or similar less-regulated vessels as a lead product to build capacity for the more economic ones, analogous to how Tesla strategically built an electric sports car to generate interest and funds for eventually building electric sedans.
Does compression research have any analogue of training behavior in ML research?
Watched a talk by Ilya Sutskever about generalization, and during the course of it he covered the equivalence of prediction and compression, and then further into maximum likelihood. The conversation was motivated by how thinking about this was a helpful perspective for him since 2016 or so.
In the talk he makes an analogy to Kolmogorov complexity, and then extends it so that conditional Kolmogorov complexity can be thought of as Kolmogorov compression and uses that when talking about why unstructured learning might work.
At 38:30 there’s a question/comment about where the Kolmogorov analogy breaks down, which was that in Kolmogorov the order of the data doesn’t matter but in training neural networks it does, and as a result they have training behaviors.
Is this something that has ever shown up in compression research, about which I know almost nothing? I realize in the typical case it is mostly for things that we can read all of simply enough, like files on a computer, and compression is mostly for trading off resources like storage or bandwidth against time or CPU, but that’s because we have really well developed applied cases. Does compression research contain work on “general compressor” algorithms the way a transformer is a general predictor? Do compression algorithms experience compression behaviors the way a neural network experiences training behaviors?
The categories of audience are too vague; my principal interest is in policymakers, so dedicate some time to analysis of (and by extension strategy for) this group.
This will require some consideration of how policy actually gets done. Big question, but I have a line of attack on that problem.
Be explicit about the rhetoric. This will provide the context for the strategy, a pattern people may mimic for future strategies, and as a nice bonus will serve as advocacy for the skillset. (Assuming it doesn’t suck)
What I expect to be able to come up with is a description of what kind of policymaker we want to convince, and a flexible plan of attack for identifying and then approaching ones who seem to fit that description. This stems from a hypothesis that policy change is accomplished by a small cadre of policymakers dedicated to the policy in question; and not accomplished by popularity, even among policymakers.
What exactly the dedicated cadre does is unclear to me, but my suspicion is that it mostly comes down to attention to detail and opportunism. Attention to detail because there are a lot of levels to making policy work that go beyond getting a bill passed, and only a dedicated person would be willing to invest that attention; opportunism because a dedicated cadre can rapidly deliver a finished product with most of the kinks worked out as soon as a window opens.
Saw a YouTube video by a guy named Michael Penn about why there is no 3 dimensional equivalent of the complex numbers. It’s going through an abridged version of the mathematical reasons and I was able to follow along until we got a point where he showed that it would have to commute ix with xi, which contradicts an initial required claim that ix does not commute with xi.
This is not satisfying to me intuitively. The thing that bothers me is that I can accept the argument that the definition is incoherent, but that doesn’t show me why we can’t get there using some different claim or procedure. Here’s what I came up with instead:
When we build the complex numbers out of the reals, rather than extending the reals by one dimension, what we are really doing is extending the reals by the size of the reals. So rather than:
reals + one dimension ⇒ complex
We have:
reals + reals ⇒ complex
Carrying this forward, extending the complex numbers to the next level up is applying the same procedure again, so rather than:
complex + one dimension ⇒ ?
We have:
complex + complex ⇒ quaternions
So if we do the thing we did to construct the complex numbers from the reals over again, what we get is the quaternions which have four dimensions rather than three.
I note that this intuition really answers the question of what you get when you extend the complex numbers, rather than the question of why you can’t have something like the complex numbers with three dimensions. For that, I think of the previous problem in reverse: In order to build a three dimensional number system using anything like the same procedure, we would need to have something extended by its own size to get us there:
something + something ⇒ threenions
Since the dimension is supposed to be three, that means we need a base number system of dimension1.5. That’s a fraction; we might be able to do this, since fractional dimensions are how a fractal is made. But the fraction for 1.5 is 3⁄2, which means we need a three dimensional number system in order to construct our three dimensional number system, which is circular and suggests there isn’t a way to construct it (at least not one from the reals or complex numbers).
The video sounded too complicated. My own “proof” is imagining that we already have a number system with independent units 1, i, and j, and asking how much would be i×j. Plausible answers are 1, −1, i, -i, j, or -j, and each of them quickly results in a problem. For example, if i×j=1, then j is actually 1/i, which already exists as a complex number, so it is not an independent third dimension. But if i×j=i, then j=1. Etc.
To put j outside the plane defined by 1 and i, the result of i×j must be yet another dimension, let’s call it k… and we have reinvented quaternions.
Quick question: could we use the Gato trick of multi-task → single-model in reverse, such that we exclude tasks?
The idea is that we would specifically create training on “bad” tasks, like connect to the internet, or write code, and then build a single model which includes the “good” ones but excludes the “bad” ones.
Based on my understanding of how these things work there’s no sense in which the tasks would be rejected exactly; rather what I imagine is a kind of pathological underperformance. An analogy would be giving GPT-3 catastrophic dyslexia on purpose.
The natural downside is that we will deliberately be building components that are pretty good at bad tasks, which is dangerous. But our bigger problem is failing to screen out bad things and so they happen accidentally, and this feels at first blush like an option for making incremental progress, at least.
I notice comparing abstractions is kind of very difficult. This is especially true on the context of abstractions we don’t know about yet. So I am wondering about how we might be able to compare abstractions at a glance.
Since this involves thinking about abstraction space, I am inclined to just take the space part literally—we’ll look at 3 dimensions, with transformations and color. The question becomes, what the heck would these represent? Wild-ass-guesses:
Compactness/elegance of the abstraction
How much does the abstraction compress/explain
Computability
Perhaps we could also impose some kind of relationship between the abstraction and humans relationship to it, like how easy it is to fit in our heads, or to transmit to other humans. Then there could be some kind of orthographic projection where one projection shows the shape of the abstraction’s power, another of its ease of use, etc. where each of these is itself an abstraction in a different corner of abstraction space.
I read Duncan’s posts on concentration of force and stag hunts. I noticed that a lot of the tug-of-war he describes seems to stem from the fact that the object-level stuff about a post and the meta-level stuff (by which I mean rationality) of the post. It also takes the strong position that eliminating the least-rational is the way to improve LessWrong in the dimension the posts are about.
I feel we can do more to make getting better at rationality easier through redirecting some of our efforts. A few ideas follow.
I want to be able to emphasize how to make a great comment, and therefore contribution to the ongoing discussion. Some people have the norm of identifying good comments, but that doesn’t help as much with how to make them, or what the thought process looks like. It would also be tedious to do this for every comment, because the workload would be impossible.
What if there were some kind of nomination process, where if I see a good comment I could flag it in such a way the author is notified that I would like to see a meta-comment about writing it in the first place?
I already enjoy meta-posts which explain other posts, and the meta-comments during our annual review where people comment on their own posts. The ability to easily request such a thing in a way that doesn’t compete for space with other commentary would be cool.
What about a parallel kind of curation, where posts with a special R symbol or something are curated by the mods (maybe plus other trusted community members) are curated exclusively on their rationality merits? I mention this because the curation process is more of the general-intellectual-pipeline criteria now, of which rationality is only a part.
My reasoning here is that I wish it were easier to find great examples to follow. It would be good to have a list of posts that were “display rationality in your post the way these posts display rationality” to look up to.
It would be nice if we had a way to separate what a post was about from the rationality displayed by the post. Maybe something like the Alignment Forum arrangement, where there is a highly-technical version of the post and a regular public version of the post, but we replace the highly technical discussion with the rationality of the post.
Another comparison would be the Wikipedia talk pages, where the page has a public face but the talk page dissecting the contents requires navigating to specifically.
My reasoning here is that when reading a post and its comments, the subject of the post, the quality of the post on regular stylistic grounds, and the quality of the post on rationality grounds all compete for my bandwidth. Creating a specific zone where attention can be focused exclusively on the rationality elements will make it easier to identify where the problems are, and capitalize on the improvements thereby.
In sum: the default view of a post should be about the post. We should have a way to be able to only look at and comment on the rationality aspects.
For the vast majority of posts, I don’t expect nor want a separation between the rationality of the post and the target of the post. This is a place to rationally discuss rationalist topics, and if either side is lacking for a post, we should comment on and improve it.
For the few posts I think you’re talking about (posts about the communities and organizations loosely related to or with some membership overlap with the site), I might just recommend a tag and filtering for those of us who don’t care very much.
I also have a notion this would help with things like the renewal of old content by making it incremental. For example, there has been a low-key wish for the Sequences to be revised and updated, but they are huge and this has proved too daunting a task for anyone to volunteer to tackle by themselves, and Eliezer is a busy man. With a tool similar to this, the community could divide up the work into comment-size increments, and once a critical mass has been reached someone can transform the post into an updated version without carrying the whole burden themselves. Also solves the problem of being too dependent on one person’s interpretations.
I’ve just begun catching up on what Ethereum is doing now that 2.0 upon us. This prompted a short reflection on smart contracts, and how smart contracts put a huge premium on clear metrics to make them resolvable automatically.
This is prime Goodhart territory. It’s already a common strategy to hedge your bets by dividing investment up among less and more risky payoffs in order to stabilize your expected benefit; I cannot for the life of me think of any reason the exact same reasoning (maybe even quantitatively) cannot be applied to proxies for a goal, where risk is replaced with accuracy or some other form of goodness-of-proxy.
This would complicate smart contracts a bit, because it adds another layer to what you want to consider in the resolution mechanisms: before you needed to agree on the metric and then agree on a source to resolve the metric and now you need to agree on a suite of metrics with sources for each.
However, I feel like it weighs heavily in favor of the more modular approach to assembling software, so that people could publish open source contract resolution mechanisms which specify Metric X resolved by Source Y, and then these would be laying around ready to be added to any given smart contract you are designing. We would be able to rate the goodness of the metric-source combination independently of the contracts in which they are included; it would be easy to compare sources or metrics as a group; individuals or businesses could pre-commit to using libraries of specific mechanisms.
Then assembling a suite of metrics to hedge your Goodhart vulnerability is as simple as adding more of them to the contract, and you can get your smart contract resolution reasoned-rule-style by weighing the mechanisms equally.
If anyone knows where in the smart contract literature this is discussed, I would love to hear about it, because with the possible exception of saying Goodhart this can’t possibly be a new thought.
Some notes about modelling DSA, inspired by Review of Soft Takeoff Can Still Lead to DSA. Relevant chunk of my comment on the post:
My reasoning for why it matters:
DSA relies on one or more capability advantages.
Each capability depends on one or more domains of expertise to develop.
A certain amount of domain expertise is required to develop the capability.
Ideas become more difficult in terms of resources and time to discover as they approach the capability threshold.
Now this doesn’t actually change the underlying intuition of a time advantage very much; mostly I just expect that the ’10x faster innovation’ component of the example will be deeply discontinuous. This leads naturally to thinking about things like a broad DSA, which might consist of a systematic advantage across capabilities, versus a tall DSA, which would be more like an overwhelming advantage in a single, high import capability.
I feel like identifying the layers at work here would be highly valuable. I could also easily see specifying a layer below domain as fields, which will allow the lowest level to map to how we usually track ideas (by paper and research group) which leaves domain the more applied engineering/technician area of development, and then finally capability describes the thing-where-the-advantage-is.
After teasing out several example capabilities and building their lower levels, it starts to looks sort of like a multidimensional version of a tech tree.
I am also interested in accounting for things like research debt. Interpretive labor is really important for the lateral movement of ideas; leaning on Daniel’s post again for example, I propose that ideas pulled from the public domain would be less effectively used than those developed in-house. This could be treated as each idea having only fractional value, or as a time delay as the interpretive labor has to be duplicated in-house before the idea yields dividends.
Had some illusions about the C language shattered recently.
I read an article from 2018 called C Is Not A Low Level Language from ACM Queue. The long and short of it is that C fails to get the programmer “close to the metal” in any meaningful sense because the abstract machine it uses doesn’t faithfully represent anything about modern computer architecture. Instead, it hides the complexity of modern instruction sets and memory arrangements beneath the abstractions which modeled hardware very well in the 1970s.
I, like most people, thought C was the best way to do hardware aside from just writing in assembly or machine code directly. I had assumed, but never checked, that as hardware advanced the work on C development was accounting for this; it appears backwards compatibility won out. This puts us in a weird position where both new hardware design and new C development are both constrained by trying to maintain compatibility with older C. In the case of hardware design, this means limiting the instruction sets of processors so they are comprehensible to C; in the case of C development this means an unyielding commitment to making people write code for the PDP-11 and relying ever more heavily on the compiler to do the real work.
The comments in the Reddit thread were, predictably, overwhelmingly focused on semantics of the high/low level dichotomy, with half claiming Assembler is also a high level language and the other half flatly rejecting the premise of the article or playing defense about how useful C is. I feel this kind of thing misses the point, because what I liked about C is that it helped me to think about what the machine was actually doing. Now I discover I wasn’t thinking about what the machine I was working on was doing so much as thinking about general types of things a machine is expected to do (have one processor, and memory, and disk, and IO). While this is clearly better than not knowing, it left what I thought was the core advantage in the wind.
I therefore did a search, assuming that if old faithful had failed someone else had surely tried to fill the niche. Nothing presented itself in my searches, which combined represent an hour or so of reading. Instead I learned something interesting about programming; it is entirely upward oriented. All the descriptions of every language billed as being either for embedded systems or as “one language to rule them all” advertised themselves entirely of what kind of high-level abstractions they gave access to.
Hence the old saw: dogs cannot look up; computer scientists cannot look down.
There’s Assembly for those people who actually care about what the hardware is doing. The question is whether there meaningfully can be a language who’s as low level as Assembly but which also provides higher abstractions to programmers.
Is there a reason warfare isn’t modeled as the production of negative value?
The only economic analyses I have seen are of the estimating-cost-of-lost-production type, which I can only assume reflects the convention of converting everything to a positive value.
I’m not sure what you’re proposing—it seems confusing to me to have “production” of negative value. I generally think of “production” as optional—there’s a lower bound of 0 at which point you prefer not to produce it.
I think there’s an important question of different entities doing the producing and capturing/suffering the value, which gets lost if you treat it as just another element of a linear economic analysis. Warfare is somewhat external to purely economic analysis, as it is generally motivated by non-economic (or partly economic but over different timeframes than are generally analyzed) values.
You and everyone else; it seems I am the only one to whom the concept makes any intuitive sense.
But the bottom line is that the value of weapons is destruction, which is to say you are paying $X in order to take away $Y from the other side. Saying we pay $X to gain $Y value is utterly nonsensical, except from the perspective of private sector weapons manufacturers.
I agree that economic models are not optimal for war, but I see a significant problem where our the way we think about war and the way we think about war preparation activities are treated as separate magisteria, and as a consequence military procurement is viewed in Congress as an economic stimulus rather than something of strategic import.
I agree that economic models are not optimal for war
Go a little further, and I’ll absolutely agree. Economic models that only consider accounting entities (currency and reportable valuation) are pretty limited in understanding most human decisions. I think war is just one case of this. You could say the same for, say, having children—it’s a pure expense for the parents, from an economic standpoint. But for many, it’s the primary joy in life and motivation for all the economic activity they partake in.
But the bottom line is that the value of weapons is destruction.
Not at all. The vast majority of weapons and military (or hobby/self-defense) spending are never used to harm an enemy. The value is the perception of strength, and relatedly, the threat of destruction. Actual destruction is minor.
military procurement is viewed in Congress as an economic stimulus
That congress (and voters) are economically naïve is a distinct problem. It probably doesn’t get fixed by additional naivete of forcing negative-value concepts into the wrong framework. If it can be fixed, it’s probably by making the broken windows fallacy ( https://en.wikipedia.org/wiki/Parable_of_the_broken_window) less common among the populace.
The value is the perception of strength, and relatedly, the threat of destruction. Actual destruction is minor.
The map is not independent of the territory, here. Few cities were destroyed by nuclear weapons, but no one would have cared about them if they couldn’t destroy cities. Destruction is the baseline reality upon which perceptions of strength operate. The whole value of the perception of strength is avoiding actual destructive exchanges; destruction remains the true concern for the overwhelming majority of such spending.
The problem I see is that war is not distinct from economics except as an abstraction; they are in reality describing the same system. What this means is we have a partial model of one perspective of the system, and total negligence of another perspective of the system. Normally we might say not to let the perfect be the enemy of the good, but we’re at the other end of the spectrum so it is more like recruiting the really bad to be an enemy of the irredeemably awful.
Which is to say that economic-adjacent arguments are something the public at large is familiar with, and their right-or-wrong beliefs are part of the lens through which they will view any new information and judge any new frameworks.
Quite separately I would find economics much more comprehensible if they included negatives throughout; as far as I can tell there is no conceptual motivation for avoiding them, it is mostly a matter of computational convenience. I would be happy to be wrong; if I could figure out the motivation for that, it would probably help me follow the logic better.
Had some illusions about the C language shattered recently.
I read an article from 2018 called C Is Not A Low Level Language from ACM Queue. The long and short of it is that C fails to get the programmer “close to the metal” in any meaningful sense because the abstract machine it uses doesn’t faithfully represent anything about modern computer architecture. Instead, it hides the complexity of modern instruction sets and memory arrangements beneath the abstractions which modeled hardware very well in the 1970s.
I, like most people, thought C was the best way to do hardware aside from just writing in assembly or machine code directly. I had assumed, but never checked, that as hardware advanced the work on C development was accounting for this; it appears backwards compatibility won out. This puts us in a weird position where both new hardware design and new C development are both constrained by trying to maintain compatibility with older C. In the case of hardware design, this means limiting the instruction sets of processors so they are comprehensible to C; in the case of C development this means an unyielding commitment to making people write code for the PDP-11 and relying ever more heavily on the compiler to do the real work.
The comments in the Reddit thread were, predictably, overwhelmingly focused on semantics of the high/low level dichotomy, with half claiming Assembler is also a high level language and the other half flatly rejecting the premise of the article or playing defense about how useful C is. I feel this kind of thing misses the point, because what I liked about C is that it helped me to think about what the machine was actually doing. Now I discover I wasn’t thinking about what the machine I was working on was doing so much as thinking about general types of things a machine is expected to do (have one processor, and memory, and disk, and IO). While this is clearly better than not knowing, it left what I thought was the core advantage in the wind.
I therefore did a search, assuming that if old faithful had failed someone else had surely tried to fill the niche. Nothing presented itself in my searches, which combined represent an hour or so of reading. Instead I learned something interesting about programming; it is entirely upward oriented. All the descriptions of every language billed as being either for embedded systems or as “one language to rule them all” advertised themselves entirely of what kind of high-level abstractions they gave access to.
Hence the old saw: dogs cannot look up; computer scientists cannot look down.
Reflecting on the failure of another embedded language to provide exactly what I wanted—a way to reason about what the machine was doing—I looked a little further afield. I know of two candidates for thinking about this problem from different directions.
The first is Verilog (now SystemVerilog), which is a Hardware Description Language (HDL). This does pretty much exactly what I want in terms of reasoning about the machine, but it is for design purposes: you describe how the hardware works, then verify the description does what you want it to, and then eventually it is instantiated in actual, physical hardware. I am not sure how or even if it could be used on existing hardware to learn things about the hardware, or to optimize tasks.
The second is a thing called the Legion Programming System, out of Stanford. This comes from the other end of the scale, targeting High Performance Computing (HPC) applications. It is not a programming language per se; rather it is a model for application development. It identifies the exact same concerns I have about taking into account modern (and future) computing architecture, but it focuses on supercomputers and huge clusters of servers.
So the first option is still looking up the ladder of abstraction, just from beneath where I want to be; the second option mostly looks at high-scale hardware from the side. I suppose the thing that would make me happiest is something like a Legion Programming System but for robotics instead of HPC. The thing I would most be able to take advantage of personally is basically just a C fork with a new abstract machine and compiler which better accounts for modern architecture. Given how much work “basically” is doing, this amounts to a completely different language that uses a C-like syntax, and it seems unlikely if no one is making it already.
My wife’s iPhone put together one of its Over the Years slideshows for my birthday, and she sent it to me because—exactly as the designers hoped—it contains several good memories. I have two wonderings about this.
First, the goodness in good memories. We periodically talk about the lowest-cost, best-return psychological intervention of gratitude, which seems heavily wrapped up with the idea of deliberately reviewing good memories. I know nothing about the algorithm that is used to put together these slideshows, but it seems to me it mostly triggers on various holidays and other calendar events. I wonder if there is a way to optimize this in the direction of maximizing gratitude, perhaps as a user-controlled feature. I wonder what maximizing gratitude actually looks like, in the sense of provoking the feeling as often as possible. Is there anything like an emotional-trigger spaced repetition system, and is there any kind of emotional-experience curve distinct from the forgetting curve?
Second, speaking of the spaced repetition system, is the memory in good memories. I have traditionally been almost totally indifferent to pictures and actively hostile to the taking of them; I have seen a lot of experiences skipped or ruined outright by the insistence on recording them for the memories. I still enjoy these sorts of slideshows when they come up, which amounts to me free riding on the labor of chronicling our lives. Sometimes they capture moments I have forgotten, which suggests there is some value in deliberately reviewing one’s life.
A stark case is Alzheimers. The classic horror is the forgetting of loved ones; if we took something like the Over the Years slideshows and put them on an SRS schedule, would people suffering from Alzheimers be able to remember their loved ones for longer? Can we even model Alzheimers as something like a faster forgetting curve?
Expanding the idea beyond treatment into a tool, what about for networking either professionally or socially? There already seems to be an abundant supply of photos and videos with brief acquaintances. Would anyone be interested in hijacking the slideshows for deliberately remembering the names of people they met at a party or professional convention?
A skilled hacker could install themselves in your memories. Add their face to your photos, first as one of many people in the background, then gradually more and more, until you would start believing they were the best friend in your childhood. Each time you review your memories, you create fake memories of your “friend”.
Also, consider the possibilities this would offer for advertising. I bet you didn’t notice that each of your best memories involves you holding a bottle of Coke. But the more you review the photos, the more obvious it becomes.
My reaction is “Ha!” and/or “Ew.”
This should also be transformed into a horror movie in the direction of Ring and Insidious, where each time we view the photos the malevolent spirit gets more integrated into our lives and when it shows up in a best friends forever photo alone with us it gets to assume our lives.
And enjoy a refreshing Coca-Cola, of course.
I’m confused about algorithms vs scaling conversations. I’ve been noodling around this question space since reading Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? a couple weeks ago, and the more I think about predictions in terms of algorithm and scale the more complete my confusion.
I currently have it head-chunked this way:
Architecture: neural nets are a family of architectures. Transformers are an example of a specific architecture.
Meta-algorithm: the arrangement of and training procedure for the stack of transforms; this seems to be what people are referring to when they say algorithmic improvements. I think of it this way because it is an algorithm-finding-algorithm.
Algorithm: the learned algorithm found by the meta-algorithm, being used to handle inputs in production. There is high uncertainty about what exactly these are, since we can only easily see the weights/activations.
Algorithmic uncertainty: by default we don’t know what the algorithm is in any detail. Mechanistic Interpretability is about reducing algorithmic uncertainty.
Scale compute. This means stuff like GPUs, TPUs, total number of flops, etc.
Scale data. Normally this means unstructured; I don’t actually have any sense of what scaling structured data looks like other than it really seemed to help AlphaFold.
Scale models. This means increasing the number of parameters which are trained, which seems to correlate with compute but strictly speaking should be independent of it.
But none of the conversations seem to match these divisions, which means at least I am out of sync with everyone else, and probably that I am wrong in several dimensions.
Communication Pollution
I think social media has brought us to a point in the development of communication where it could now be considered communication pollution. I model this like light:
Nordhaus traced the falling price of light over time in real terms.
Now light is so cheap it is essentially free, and so pervasive it causes harm. This is called light pollution.
There are organizations dedicated to restoring darkness hoping to reverse these harms.
The National Institutes of Health have sponsored research into the impact on humans.
In short, communication is so convenient there is no incentive to make it good. Since there is no incentive to make it good, the bad (by which I mean low-quality) radically outnumbers the good, and eats up the available attention-bandwidth.
At the same time most people underconsume light. Both in the amount of lumen they have available and in the ability to set light color differently at different time of the day.
This appears to be the strategy the organizations dedicated to darkening are pursuing.
Popular example: many streetlights throw light directly into the sky or sideways. They try to advocate for choosing the more expensive streetlights that effectively direct the light to the walking areas people actually use. The net result is usually a better-illuminated sidewalk.
I’ve seen research and demo-houses that employ the different light over the course of the day approach, but I have not seen anything about trying to get developers to offer it. In this way it falls into the same gap as any given environmental improvement in home/apartment building; people would probably by it if it were available, but have don’t have the option to select it because it isn’t; there’s no real way to express demand to developers short of people choosing to do independent design/builds.
I feel like this should also be handled by a contractor, but there’s not much in the way of ‘lighting contractors’ to appeal to; it seems like an electrician would have to shift their focus, or an interior decorator would have to expand into more contractor-y work.
The technology is commerically sold with Philips Hue (and a few lesser known brands). It’s just a matter of putting different lightbulbs into the fixtures for light bulbs and setting up when you want what light.
My room is red at the end of the day and my screen is redish via f-lux.
Yeah. Digital communication removes costs such as paying for paper etc. But after all these costs are gone, two remain:
the time and work it cost you to produce the information;
the time and work it costs other people to select the information they want.
Here the relation is a bit adversarial: the delivery of the right information to the right reader is a cost that the writer and the readers most split somehow.
The writer may pay the cost by thinking twice whether something should be written at all, by doing some research and writing it well, perhaps by tagging it somehow so the readers can easily (preferably automatically) decide whether they are interested. Or the writer may simply throw around many scraps of content, of random topic, length, and quality, and let the readers sort it out.
This is partially about the writers (choosing to write about something, doing your homework first) and partially about the publishing systems (do they support tagging and filtering by tags, preferentially following your favorite writers, is there a possibility to mark something as “important” and a punishment for abusing this flag). Maybe the systems are frustrating because designing a non-frustrating system is extremely difficult. Or maybe the systems are frustrating because anything more complicated than “throw around scraps of content” would be too complicated to use for most users.
And the advertising! How could I expect companies to give me only the content I am interested in and filter out everything annoying, if “giving me the annoying things” is precisely their business model? So not only are good systems difficult to design and unlikely to become popular, there is even smaller incentive to create them.
I don’t think that either Google or facebook as well described as “giving you the annoying things”. Both companies invest into giving people ads that they might engage with.
This happens both because they want to keep their users and don’t push them away and because the people who appreciate a given ad are most likely to engage with it.
If I search for a new microwave on Google and then get a bunch of new microwave ads everywhere I browse I’m likely getting those at the time of my life where I’m least bothered by them given that I actually want to buy a microwave.
When I buy a microwave, I typically open the websites of the few shops I trust, search for the products, compare their parameters, and read the reviews. I don’t really understand what extra value I get from Google showing me more microwave ads when I am trying to listen to some music on YouTube afterwards. My priors for Google showing me a better microwave are quite low. (But this is all hypothetical, because Google will show me the ad for Grammarly regardless.)
My knowledge may be obsolete, but the last time I checked, there were essentially two models: “pay per view” and “pay per click”. Given that I am the kind of user who almost never clicks on ads, it makes more sense to show me the “per per view” ones, right?
But then, the income from ads shown to me is proportional to the number of ads shown to me. So the optimal amount of ads to show me is… smaller than the amount that would make me stop using the website… but not much smaller. Did I get my math wrong somewhere?
I was trying out a different intuition for articulating the problem of AGI, and thought I would ask ChatGPT about it. I am curious how GPT4 would respond differently, and may ask some of the other LLMs about it.
Imagine a general optimizer function which optimizes the value of any function (or functions) passed to it. We pass a set of functions, A, to the general optimizer function. What happens to the value of all other functions, ~A?
The behavior of the general optimizer function towards functions outside of the set A would depend on the implementation of the function and the optimization algorithm used.
In general, the optimizer function would focus on optimizing the value of the functions in set A, and may not directly consider the value of the other functions. However, the optimization process could indirectly impact the values of other functions, especially if the functions in set A are dependent on the values of other functions.
For example, if the functions in set A are used as inputs to other functions, then optimizing the value of the functions in set A could indirectly impact the values of these other functions. Additionally, some optimization algorithms may explore the space of function values beyond the set A, in which case the values of functions outside of A could also be impacted.
In summary, the behavior of the general optimizer function towards functions outside of the set A would depend on the implementation and specific circumstances, but in general, the optimizer would focus on optimizing the values of the functions in set A.
If there are finite resources, and every function requires resources, then what happens to the functions outside of the set A?
If there are finite resources and every function requires resources, then the resources allocated to optimize the functions in set A would be taken away from other functions. This could result in several outcomes for functions outside of set A:
Some functions outside of set A may continue to operate as usual, as they do not depend on the resources that are being allocated to optimize the functions in set A.
Some functions outside of set A may experience reduced performance or functionality, as resources are redirected towards optimizing the functions in set A.
Some functions outside of set A may fail completely if they are highly dependent on resources that are being allocated to optimize the functions in set A.
The specific outcome for each function outside of set A would depend on the relative importance of the function, its resource requirements, and the resources being allocated to optimize the functions in set A.
Therefore, in situations where there are finite resources and every function requires resources, it is important to carefully consider the trade-offs between optimizing a set of functions and the impact on other functions that are not part of that set. It may be necessary to prioritize certain functions over others, or to optimize multiple sets of functions in a phased approach to minimize the impact on other functions.
If the general optimizer function is among the functions in set A, and there are finite resources, and every function requires resources, then what happens to the functions not in A?
If the general optimizer function is included in the set of functions A, then optimizing the functions in set A would also include optimizing the general optimizer function itself.
If there are finite resources and every function requires resources, then the resources allocated to optimize the functions in set A, including the general optimizer function, would be taken away from other functions not in set A. This could lead to the outcomes discussed in my previous response, where the functions outside of set A may experience reduced performance or fail completely if they are highly dependent on resources that are being allocated to optimize the functions in set A.
In addition, if the general optimizer function is not optimized effectively, then it may not be able to optimize the other functions in set A as well as it could. This could result in suboptimal solutions or increased resource consumption for the optimization process.
Therefore, if the general optimizer function is included in the set of functions A, it is important to ensure that the resources allocated to optimizing the general optimizer function are carefully balanced with the resources allocated to optimizing the other functions in set A, as well as any other functions outside of set A that are important to the overall system. This may involve optimizing the general optimizer function separately, or using a phased approach to optimize different sets of functions.
Specialization of Labor in Research
I think we do specialization of labor wrong a lot. This makes me think there is a lot to be gained from a sort of back-to-basics approach, and I feel like knowledge work in general and research in particular are good candidates.
The classic example of specialization of labor is comparing the output of a single blacksmith in the production of pins with 1⁄10 the output of a factory where 10 men work producing pins, but each is responsible for 1-3 tasks in the pin-making process. Each man gets very good at his appointed tasks, and as a consequence the per-worker output is 4800 pins per day, and the lone metalworker’s output is perhaps 20 pins per day.
Imagine for a moment how research is done: there is a Primary Investigator who runs a lab, who works with grad students and postdocs. Normally when the PI has a research project, he will assign the work to his different assistants with so-and-so gathering the samples, and so-and-so running them through the equipment, and so-and-so preparing some preliminary analysis. In this way the labor is divided; Adam Smith called it division of labor; why don’t I think it counts?
Because they aren’t specializing. The pitch with the pin factory is that each worker gets very good at their 1-3 pin-making tasks. The division of labor is ad-hoc in the research case; in fact the odds are good that the opportunity to get very good is effectively avoided, because each assistant is expected to be competent in each of the relevant tasks.
This is because a scientist is a blacksmith-of-abstractions, and the grad students and postdocs are the apprentices. The point is only mostly to produce research; the other point is to turn the assistants into future PIs themselves.
This is how “artisanal” small research labs work, but larger research groups usually have more specialisation—especially in industry, which happens to have almost all the large-scale research groups. People might specialise in statistics and data analysis, experimental design, lab work, research software engineering, etc. Bell Labs did not operate on the PI model, for example; nor does DARPA or Google or …
I’d be interested in hearing more about how Google runs things, which I have no knowledge of; I note that Bell Labs and DARPA are the go-to examples of standout production, and one of them is defunct now. This also scans with (what I understand to be) the usual economic finding that large firms produce more innovation on average than small ones.
I have other examples from industry I’ve been thinking about where the problem seems to be dividing labor at the wrong level (or maybe not applying it completely at every level?). The examples I have in mind here are pharmaceutical companies, who underwent a long series of mergers starting in the 80s.
The interesting detail here is that they did do specialization very similar to the pin example, but they didn’t really account for research being a much broader problem than pin-making. Prior to the big downsizing of labs in pharma and chemicals, the story went that all these labs during mergers had been glommed together and ideas would just be put in one end and then either a working product came out the other end or not; there was no feedback loop at the level of the product or idea.
This looks to me like a case where small university labs are only specialized at the meta level, and the failing industrial labs are only specialized at the object level. It feels to me like if there were a graceful way to describe the notion that specialization has dimensionality, we’d be able to innovate better.
Does anyone know of a good tool for animating 3d shapes? I have a notion for trying to visualize changes in capability by doing something like the following:
A cube is grows along the z axis in according to some data
When a particular threshold is reached, a new cube branches off from that threshold, and extends in the x or y axis (away from the “trunk”)
When a particular threshold is reached on that cube, the process repeats
In this way there would be kind of a twisted tree, and I could deploy our intuitions about trees as a way to drive the impact. I thought there would be something like a D3 for these kinds of manipulations, but I haven’t found it yet.
There is http://www.povray.org/ which renders 3D pictures and animations similarly to Blender, except that instead of having an editor, you describe the shapes by equations.
Like, you can specify that a cube is at some coordinates, scaled and rotated depending on the time variable, and conditional on time being greater than some value, another cube exists, etc.
If the animation you want to make is simple from the mathematical perspective, you could either write the code that animates the shapes, or alternatively you could write a script in your favorite language that would animate the shapes and output recipes for rendering static pictures for individual frames.
Recently I saw a video about shadertoy.com. I think the two websites are similar.
Boom! That is exactly the kind of thing I needed. Thank you!
Is spaced repetition software a good tool for skill development or good practice reinforcement?
I was recently considering using an Anki prompt to do a mental move rather than to test my recall, like tense your muscles as though you were performing a deadlift. I don’t actually have access to a gym right now, so I didn’t get to put it into action immediately. Visualizing the movement as vividly as possible, and tensing muscles like the movement was being performed (even when not doing it) are common tricks reported by famous weightlifters.
But I happened across an article from Runner’s World today which described an experiment where all they did was tell a group of runners the obvious things that everyone already knows about preventing injury. The experimental group saw ~13% fewer injuries.
This suggests to me that my earlier idea is probably a good one, even though it isn’t memory per se. The obvious hitch is that what I am after isn’t actually recall—it isn’t as though runners forget that overtraining leads to injury if you were to ask them, and I have never forgotten how to do a deadlift.
Rather the question is how to make it as correct and instinctive as possible.
This feels like a physical analogue of my earlier notion about Ankifying the elements of a problem, so as to integrate it into my perspective and notice relevant information.
Maybe a better way to say this is using an Anki prompt to help respond to a physical prompt, that being the task itself.
A physical action in response to the physical task instinctively already has a name; it is called muscle memory.
Gwern covers a bit of research here on when spacing does and doesn’t work:
https://www.gwern.net/Spaced-repetition#subjects
Personally I’ve found the biggest problem with spaced repetition for skills and habits is that it’s contextless.
Adding the context from multiple skills with different contexts makes it take way more time, and not having the context makes it next to useless for learning the skils.
Could you talk a bit more about this? My initial reaction is that I am almost exactly proposing additional value from using Anki to engage the skill sans context (in addition to whatever actual practice is happening with context).
I review Gwern’s post pretty much every time I resume the habit; it doesn’t look like it has been evaluated in connection with physical skills.
I suspect the likeliest difference is that the recall curve is going to be different from the practice curve for physical skills, and the curve for mental review of physical skills will probably be different again. These should be trivial to adjust if we knew what they were, but alas, I do not.
Maybe I could pillage the sports performance research? Surely they do something like this.
It is hard to find, but it’s covered here: https://www.gwern.net/Spaced-repetition#motor-skills
My take is pretty similar to cognitive skills: It works well for simple motor skills but not as well for complex skills.
My experience is basically that this doesn’t work. This seems to track with the research on skill transfer (which is almost always non-existent or has such a small effect that it can’t be measured.)
Ah, the humiliation of using the wrong ctrl-f inputs! But of course it would be lower level.
Well that’s reason enough to cap my investment in the notion; we’ll stick to cheap experiments if the muse descends.
I have some cards in my Anki collection that ask me to review dance moves, and I have found that it is helpful for making sure my body remembers how to do them
I wonder how hard it would be to design a cost+confidence widget that would be easily compatible (for liberal values of easy) with spreadsheets.
I’m reading a Bloomberg piece about Boeing which quotes employees talking about the MAX as being largely a problem of choosing the lowest bidder. This is also a notorious problem in other places where there are rules which specify using the lowest cost contractor, like parts of California and many federal procurement programs. It’s a pretty widespread complaint.
It would also be completely crazy for it to be any other way, for what feels like a simple reason: no one knows anything except the quoted price. There’s no way to communicate confidence simply or easily. The lowest bid is easily transmitted through any spreadsheet, word document, or accounting software, quality of work be damned. Any attempt to even evaluate the work is a complicated one-off report, usually with poorly considered graphics, which isn’t even included with the budget information so many decision makers are unlikely to ever see it.
So, a cost+confidence widget. On the simple end I suppose it could just be a re-calculation using a reference class forecast. But if I’m honest what I really want is something like a spreadsheet where each cell has both a numerical and graphical value, so I could hover my mouse over the number to see the confidence graph, or switch views to see the confidence graphs instead of the values.
Then if we’re getting really ambitious, something like a time+confidence widget would also be awesome. Then when the time+confidence and cost+confidence values are all multiplied together, the output is a like a heat map of the two values, showing the distribution of project outcomes overall.
Are math proofs useful at all for writing better algorithms? I saw on Reddit recently that they proved Batchelor’s Law in 3D, the core idea of which seems to be using stochastic assumptions to prove it cannot be violated. The Quanta article does not seem to contain a link to the paper, which is weird.
Batchelor’s Law is the experimentally-observed fact that turbulence occurs at a specific ratio across scales, which is to say when you zoom in on a small chunk of the turbulence it looks remarkably like all of the turbulence, and so on. Something something fractals something.
Looking up the relationship between proofs and algorithms mostly goes to proofs about specific algorithms, and sometimes using algorithms as a form of proof; but what I am after is whether a pure-math proof like the above one can be mined for useful information about how to build an algorithm in the first place. I have read elsewhere that algorithmic efficiency is about problem information, and this makes intuitive sense to me; but what kind of information am I really getting out of mathematical proofs, assuming I can understand them?
I don’t suppose there’s a list somewhere that handily matches tricks for proving things in mathematics to tricks for constructing algorithms in computer science?
A proof may show that an algorithm works. If the proof is correct*, this may demonstrate that the algorithm is robust. (Though you really want a proof about an implementation of the algorithm, which is a program.)
*A proof that a service will never go down which relies on assumptions with the implication “there are no extreme solar storms” may not be a sufficient safeguard against the possibility that the service will go down if there is an extreme solar storm. Less extremely, perhaps low latency might be proved to hold, as long as the internet doesn’t go down.
How are algorithms made, and how can proofs improve/be incorporated into that process?
Given a problem, you can try and solve it (1). You can guess(2). You can try (one or more) different things and just see if they work(3).
1 and 2 can come apart, and that’s where checking becomes essential. A proof that the method you’re using goes anywhere (fast) can be useful there.
Let’s take a task:
Sorting. It can be solved by:
1. Taking a smaller instance, solving that (and paying attention to process). Then extract the process and see how well it generalizes
2. Handle the problem itself
3. Do something. See if it worked.
2 and 3 can come apart:
At its worst, 3 can look like Bogosort. Thought that process can be improved. Look at the first two elements. Are they sorted? No: shuffle them. Look at the next two elements...
4! = 12, twelve permutations of 4 elements. The sorting so far has eliminated some possibilities:
1, 2, 3, 4
1, 3, 2, 4
1, 4, 2, 3
2, 3, 1, 4
2, 4, 1, 3
Now all that’s needed is a method of shuffling that doesn’t make things less orderly… And eventually Mergesort may be invented.
In the extreme, 3 may be ‘automated’:
programs write programs, and test them to see if they do what’s needed (or a tester gets a guesser thrown at it, to ‘crack the password’)
evolutionary algorithms
The post you linked to (algorithmic efficiency is about problem information) - the knowledge that method X works best when conditions Y are met, which is used in a polyalgorithmic approach? That knowledge might come from proofs.
Special Purpose Permit Companies
How many kinds of environmental and/or building permits could be gotten by company A, and then used for company B’s project, after company B buys company A? For example, could we form a special purpose company to pursue navigating all the NEPA stuff for building a power plant at a particular site, and then sell that company to a power company, who could then begin construction immediately?
Consider real estate developers: one company buys the land and develops it, which means do all the construction, and then sells it to other companies who use the buildings.
I propose an extension of this model, where one company will buy the land and do the legal/regulatory work, and then sell the site+permits to another company who will do the construction of the facility.
The advantage here is that rather than one company taking the whole risk of the land and permitting process through to completion of the needed facility, the land and permitting can be handled by a special purpose company which can be sold, construction-ready, to a market of potential buyers. Because there are multiple potential buyers, the risks are lower than they are for one company doing the whole process alone, and normally you can’t sell permits/licenses/etc once they are granted so if the project falls through for some reason the work done on the regulatory stuff is wasted.
If widely adopted as a practice, what I hope this would allow is a lot of potential building sites for needed infrastructure can be prepared for construction simultaneously. While this will do nothing to speed up projects starting now, it would potentially cut years off the construction time of the next generation of facilities as an abundance of sites would be waiting for them already.
Things to consider:
Pays out faster than regular real estate development on a within-sector basis.
Here I envision industrial/infrastructure uses, but for areas that are sufficiently difficult (New York or California, for example) it could be viable for things like housing in combination with traditional development (permit company → development company → homebuyer).
You almost definitely cannot get “general construction” permits whereby you can build anything, which on the one hand is limiting and requires strategic choices in permitting.
On the other hand it allows some strategic influence over the development of useful sites.
I wonder how feasible it would be to make something that would allow people to coordinate for consumption like unions for bargaining, parties for politics, or corporations for production. In particular I have in mind targeting entertainment and social media consumption because of the prevalence of recommendation algorithms here.
The intuition is that these algorithms are political computations: who gets the advertising traffic, what type of content is in demand, which consumers are catered to.
The thing being coordinated on is the consumer behavior; namely what people click on.
Examples:
like a union, but instead of collective bargaining to sell labor, it is collective behavioring[1] to consume content.
like the old website Massdrop, but instead of coordinated buying it is coordinated clicking.
like a political party, but instead of coordinating votes, it coordinates clicks.
like a company, but instead of coordinating work, it coordinates consumption.
Concretely, I have in mind something like a browser widget that overlays indicators of the most impactful consumption choice to make to influence the algorithm in the direction you want, such as an icon or a color filter. This would run over things like youtube, instagram, tiktok, etc.
Another way to think of this: a value handshake between whatever algorithm the company is using and an algorithm you would choose instead.
behavioring—by this I mean consciously conforming to a pattern of behavior so it shows up in behavior analysis, in this case consumer behavior tracking which is part of the function of recommendation algorithms.
I read a story once about a billionaire who managed to flout a zoning limitation by building himself an absurdly huge high-rise single apartment. Once that was done, he waited a bit, and either using a loophole that already existed or one he finagled into the law, he added a bunch of walls inside is personal mega-apartment to and then sold off all these new units as regular apartments.
Now I think of this in the context of the American dredging and shipping laws. There’s a lot of rules surrounding dredges and shipping vessels, but there are a lot fewer surrounding luxury items like yachts. So imagine a group of investors building a big, state of the art shipyard that builds yachts, but which can be converted for little or no additional cost to build dredges or vessels suitable for river shipping or similar.
A simpler alternative scheme is to just comply with the laws but use yachts or similar less-regulated vessels as a lead product to build capacity for the more economic ones, analogous to how Tesla strategically built an electric sports car to generate interest and funds for eventually building electric sedans.
Does compression research have any analogue of training behavior in ML research?
Watched a talk by Ilya Sutskever about generalization, and during the course of it he covered the equivalence of prediction and compression, and then further into maximum likelihood. The conversation was motivated by how thinking about this was a helpful perspective for him since 2016 or so.
In the talk he makes an analogy to Kolmogorov complexity, and then extends it so that conditional Kolmogorov complexity can be thought of as Kolmogorov compression and uses that when talking about why unstructured learning might work.
At 38:30 there’s a question/comment about where the Kolmogorov analogy breaks down, which was that in Kolmogorov the order of the data doesn’t matter but in training neural networks it does, and as a result they have training behaviors.
Is this something that has ever shown up in compression research, about which I know almost nothing? I realize in the typical case it is mostly for things that we can read all of simply enough, like files on a computer, and compression is mostly for trading off resources like storage or bandwidth against time or CPU, but that’s because we have really well developed applied cases. Does compression research contain work on “general compressor” algorithms the way a transformer is a general predictor? Do compression algorithms experience compression behaviors the way a neural network experiences training behaviors?
This seems worth a search or two.
Rhetoric about AGI: Notes to self
Caught the aftermath of a contest for generating short, no-context arguments for why AGI matters. It appears there will be a long-form contest in the future; these are basically notes towards an entry.
The categories of audience are too vague; my principal interest is in policymakers, so dedicate some time to analysis of (and by extension strategy for) this group.
This will require some consideration of how policy actually gets done. Big question, but I have a line of attack on that problem.
Be explicit about the rhetoric. This will provide the context for the strategy, a pattern people may mimic for future strategies, and as a nice bonus will serve as advocacy for the skillset. (Assuming it doesn’t suck)
What I expect to be able to come up with is a description of what kind of policymaker we want to convince, and a flexible plan of attack for identifying and then approaching ones who seem to fit that description. This stems from a hypothesis that policy change is accomplished by a small cadre of policymakers dedicated to the policy in question; and not accomplished by popularity, even among policymakers.
What exactly the dedicated cadre does is unclear to me, but my suspicion is that it mostly comes down to attention to detail and opportunism. Attention to detail because there are a lot of levels to making policy work that go beyond getting a bill passed, and only a dedicated person would be willing to invest that attention; opportunism because a dedicated cadre can rapidly deliver a finished product with most of the kinks worked out as soon as a window opens.
Saw a YouTube video by a guy named Michael Penn about why there is no 3 dimensional equivalent of the complex numbers. It’s going through an abridged version of the mathematical reasons and I was able to follow along until we got a point where he showed that it would have to commute ix with xi, which contradicts an initial required claim that ix does not commute with xi.
This is not satisfying to me intuitively. The thing that bothers me is that I can accept the argument that the definition is incoherent, but that doesn’t show me why we can’t get there using some different claim or procedure. Here’s what I came up with instead:
When we build the complex numbers out of the reals, rather than extending the reals by one dimension, what we are really doing is extending the reals by the size of the reals. So rather than:
reals + one dimension ⇒ complex
We have:
reals + reals ⇒ complex
Carrying this forward, extending the complex numbers to the next level up is applying the same procedure again, so rather than:
complex + one dimension ⇒ ?
We have:
complex + complex ⇒ quaternions
So if we do the thing we did to construct the complex numbers from the reals over again, what we get is the quaternions which have four dimensions rather than three.
I note that this intuition really answers the question of what you get when you extend the complex numbers, rather than the question of why you can’t have something like the complex numbers with three dimensions. For that, I think of the previous problem in reverse: In order to build a three dimensional number system using anything like the same procedure, we would need to have something extended by its own size to get us there:
something + something ⇒ threenions
Since the dimension is supposed to be three, that means we need a base number system of dimension1.5. That’s a fraction; we might be able to do this, since fractional dimensions are how a fractal is made. But the fraction for 1.5 is 3⁄2, which means we need a three dimensional number system in order to construct our three dimensional number system, which is circular and suggests there isn’t a way to construct it (at least not one from the reals or complex numbers).
The video sounded too complicated. My own “proof” is imagining that we already have a number system with independent units 1, i, and j, and asking how much would be i×j. Plausible answers are 1, −1, i, -i, j, or -j, and each of them quickly results in a problem. For example, if i×j=1, then j is actually 1/i, which already exists as a complex number, so it is not an independent third dimension. But if i×j=i, then j=1. Etc.
To put j outside the plane defined by 1 and i, the result of i×j must be yet another dimension, let’s call it k… and we have reinvented quaternions.
I like this one better! It’s a more direct appeal to geometric intuition, which is the only area of math where I have any intuition at all.
Quick question: could we use the Gato trick of multi-task → single-model in reverse, such that we exclude tasks?
The idea is that we would specifically create training on “bad” tasks, like connect to the internet, or write code, and then build a single model which includes the “good” ones but excludes the “bad” ones.
Based on my understanding of how these things work there’s no sense in which the tasks would be rejected exactly; rather what I imagine is a kind of pathological underperformance. An analogy would be giving GPT-3 catastrophic dyslexia on purpose.
The natural downside is that we will deliberately be building components that are pretty good at bad tasks, which is dangerous. But our bigger problem is failing to screen out bad things and so they happen accidentally, and this feels at first blush like an option for making incremental progress, at least.
Graphical comparison of abstractions
I notice comparing abstractions is kind of very difficult. This is especially true on the context of abstractions we don’t know about yet. So I am wondering about how we might be able to compare abstractions at a glance.
Since this involves thinking about abstraction space, I am inclined to just take the space part literally—we’ll look at 3 dimensions, with transformations and color. The question becomes, what the heck would these represent? Wild-ass-guesses:
Compactness/elegance of the abstraction
How much does the abstraction compress/explain
Computability
Perhaps we could also impose some kind of relationship between the abstraction and humans relationship to it, like how easy it is to fit in our heads, or to transmit to other humans. Then there could be some kind of orthographic projection where one projection shows the shape of the abstraction’s power, another of its ease of use, etc. where each of these is itself an abstraction in a different corner of abstraction space.
I read Duncan’s posts on concentration of force and stag hunts. I noticed that a lot of the tug-of-war he describes seems to stem from the fact that the object-level stuff about a post and the meta-level stuff (by which I mean rationality) of the post. It also takes the strong position that eliminating the least-rational is the way to improve LessWrong in the dimension the posts are about.
I feel we can do more to make getting better at rationality easier through redirecting some of our efforts. A few ideas follow.
I want to be able to emphasize how to make a great comment, and therefore contribution to the ongoing discussion. Some people have the norm of identifying good comments, but that doesn’t help as much with how to make them, or what the thought process looks like. It would also be tedious to do this for every comment, because the workload would be impossible.
What if there were some kind of nomination process, where if I see a good comment I could flag it in such a way the author is notified that I would like to see a meta-comment about writing it in the first place?
I already enjoy meta-posts which explain other posts, and the meta-comments during our annual review where people comment on their own posts. The ability to easily request such a thing in a way that doesn’t compete for space with other commentary would be cool.
What about a parallel kind of curation, where posts with a special R symbol or something are curated by the mods (maybe plus other trusted community members) are curated exclusively on their rationality merits? I mention this because the curation process is more of the general-intellectual-pipeline criteria now, of which rationality is only a part.
My reasoning here is that I wish it were easier to find great examples to follow. It would be good to have a list of posts that were “display rationality in your post the way these posts display rationality” to look up to.
It would be nice if we had a way to separate what a post was about from the rationality displayed by the post. Maybe something like the Alignment Forum arrangement, where there is a highly-technical version of the post and a regular public version of the post, but we replace the highly technical discussion with the rationality of the post.
Another comparison would be the Wikipedia talk pages, where the page has a public face but the talk page dissecting the contents requires navigating to specifically.
My reasoning here is that when reading a post and its comments, the subject of the post, the quality of the post on regular stylistic grounds, and the quality of the post on rationality grounds all compete for my bandwidth. Creating a specific zone where attention can be focused exclusively on the rationality elements will make it easier to identify where the problems are, and capitalize on the improvements thereby.
In sum: the default view of a post should be about the post. We should have a way to be able to only look at and comment on the rationality aspects.
For the vast majority of posts, I don’t expect nor want a separation between the rationality of the post and the target of the post. This is a place to rationally discuss rationalist topics, and if either side is lacking for a post, we should comment on and improve it.
For the few posts I think you’re talking about (posts about the communities and organizations loosely related to or with some membership overlap with the site), I might just recommend a tag and filtering for those of us who don’t care very much.
I also have a notion this would help with things like the renewal of old content by making it incremental. For example, there has been a low-key wish for the Sequences to be revised and updated, but they are huge and this has proved too daunting a task for anyone to volunteer to tackle by themselves, and Eliezer is a busy man. With a tool similar to this, the community could divide up the work into comment-size increments, and once a critical mass has been reached someone can transform the post into an updated version without carrying the whole burden themselves. Also solves the problem of being too dependent on one person’s interpretations.
Could we hedge Goodhart?
I’ve just begun catching up on what Ethereum is doing now that 2.0 upon us. This prompted a short reflection on smart contracts, and how smart contracts put a huge premium on clear metrics to make them resolvable automatically.
This is prime Goodhart territory. It’s already a common strategy to hedge your bets by dividing investment up among less and more risky payoffs in order to stabilize your expected benefit; I cannot for the life of me think of any reason the exact same reasoning (maybe even quantitatively) cannot be applied to proxies for a goal, where risk is replaced with accuracy or some other form of goodness-of-proxy.
This would complicate smart contracts a bit, because it adds another layer to what you want to consider in the resolution mechanisms: before you needed to agree on the metric and then agree on a source to resolve the metric and now you need to agree on a suite of metrics with sources for each.
However, I feel like it weighs heavily in favor of the more modular approach to assembling software, so that people could publish open source contract resolution mechanisms which specify Metric X resolved by Source Y, and then these would be laying around ready to be added to any given smart contract you are designing. We would be able to rate the goodness of the metric-source combination independently of the contracts in which they are included; it would be easy to compare sources or metrics as a group; individuals or businesses could pre-commit to using libraries of specific mechanisms.
Then assembling a suite of metrics to hedge your Goodhart vulnerability is as simple as adding more of them to the contract, and you can get your smart contract resolution reasoned-rule-style by weighing the mechanisms equally.
If anyone knows where in the smart contract literature this is discussed, I would love to hear about it, because with the possible exception of saying Goodhart this can’t possibly be a new thought.
Some notes about modelling DSA, inspired by Review of Soft Takeoff Can Still Lead to DSA. Relevant chunk of my comment on the post:
I feel like identifying the layers at work here would be highly valuable. I could also easily see specifying a layer below domain as fields, which will allow the lowest level to map to how we usually track ideas (by paper and research group) which leaves domain the more applied engineering/technician area of development, and then finally capability describes the thing-where-the-advantage-is.
After teasing out several example capabilities and building their lower levels, it starts to looks sort of like a multidimensional version of a tech tree.
I am also interested in accounting for things like research debt. Interpretive labor is really important for the lateral movement of ideas; leaning on Daniel’s post again for example, I propose that ideas pulled from the public domain would be less effectively used than those developed in-house. This could be treated as each idea having only fractional value, or as a time delay as the interpretive labor has to be duplicated in-house before the idea yields dividends.
Had some illusions about the C language shattered recently.
I read an article from 2018 called C Is Not A Low Level Language from ACM Queue. The long and short of it is that C fails to get the programmer “close to the metal” in any meaningful sense because the abstract machine it uses doesn’t faithfully represent anything about modern computer architecture. Instead, it hides the complexity of modern instruction sets and memory arrangements beneath the abstractions which modeled hardware very well in the 1970s.
I, like most people, thought C was the best way to do hardware aside from just writing in assembly or machine code directly. I had assumed, but never checked, that as hardware advanced the work on C development was accounting for this; it appears backwards compatibility won out. This puts us in a weird position where both new hardware design and new C development are both constrained by trying to maintain compatibility with older C. In the case of hardware design, this means limiting the instruction sets of processors so they are comprehensible to C; in the case of C development this means an unyielding commitment to making people write code for the PDP-11 and relying ever more heavily on the compiler to do the real work.
The comments in the Reddit thread were, predictably, overwhelmingly focused on semantics of the high/low level dichotomy, with half claiming Assembler is also a high level language and the other half flatly rejecting the premise of the article or playing defense about how useful C is. I feel this kind of thing misses the point, because what I liked about C is that it helped me to think about what the machine was actually doing. Now I discover I wasn’t thinking about what the machine I was working on was doing so much as thinking about general types of things a machine is expected to do (have one processor, and memory, and disk, and IO). While this is clearly better than not knowing, it left what I thought was the core advantage in the wind.
I therefore did a search, assuming that if old faithful had failed someone else had surely tried to fill the niche. Nothing presented itself in my searches, which combined represent an hour or so of reading. Instead I learned something interesting about programming; it is entirely upward oriented. All the descriptions of every language billed as being either for embedded systems or as “one language to rule them all” advertised themselves entirely of what kind of high-level abstractions they gave access to.
Hence the old saw: dogs cannot look up; computer scientists cannot look down.
There’s Assembly for those people who actually care about what the hardware is doing. The question is whether there meaningfully can be a language who’s as low level as Assembly but which also provides higher abstractions to programmers.
Is there a reason warfare isn’t modeled as the production of negative value?
The only economic analyses I have seen are of the estimating-cost-of-lost-production type, which I can only assume reflects the convention of converting everything to a positive value.
But it is so damned anti-intuitive!
I’m not sure what you’re proposing—it seems confusing to me to have “production” of negative value. I generally think of “production” as optional—there’s a lower bound of 0 at which point you prefer not to produce it.
I think there’s an important question of different entities doing the producing and capturing/suffering the value, which gets lost if you treat it as just another element of a linear economic analysis. Warfare is somewhat external to purely economic analysis, as it is generally motivated by non-economic (or partly economic but over different timeframes than are generally analyzed) values.
You and everyone else; it seems I am the only one to whom the concept makes any intuitive sense.
But the bottom line is that the value of weapons is destruction, which is to say you are paying $X in order to take away $Y from the other side. Saying we pay $X to gain $Y value is utterly nonsensical, except from the perspective of private sector weapons manufacturers.
I agree that economic models are not optimal for war, but I see a significant problem where our the way we think about war and the way we think about war preparation activities are treated as separate magisteria, and as a consequence military procurement is viewed in Congress as an economic stimulus rather than something of strategic import.
Go a little further, and I’ll absolutely agree. Economic models that only consider accounting entities (currency and reportable valuation) are pretty limited in understanding most human decisions. I think war is just one case of this. You could say the same for, say, having children—it’s a pure expense for the parents, from an economic standpoint. But for many, it’s the primary joy in life and motivation for all the economic activity they partake in.
Not at all. The vast majority of weapons and military (or hobby/self-defense) spending are never used to harm an enemy. The value is the perception of strength, and relatedly, the threat of destruction. Actual destruction is minor.
That congress (and voters) are economically naïve is a distinct problem. It probably doesn’t get fixed by additional naivete of forcing negative-value concepts into the wrong framework. If it can be fixed, it’s probably by making the broken windows fallacy ( https://en.wikipedia.org/wiki/Parable_of_the_broken_window) less common among the populace.
The map is not independent of the territory, here. Few cities were destroyed by nuclear weapons, but no one would have cared about them if they couldn’t destroy cities. Destruction is the baseline reality upon which perceptions of strength operate. The whole value of the perception of strength is avoiding actual destructive exchanges; destruction remains the true concern for the overwhelming majority of such spending.
The problem I see is that war is not distinct from economics except as an abstraction; they are in reality describing the same system. What this means is we have a partial model of one perspective of the system, and total negligence of another perspective of the system. Normally we might say not to let the perfect be the enemy of the good, but we’re at the other end of the spectrum so it is more like recruiting the really bad to be an enemy of the irredeemably awful.
Which is to say that economic-adjacent arguments are something the public at large is familiar with, and their right-or-wrong beliefs are part of the lens through which they will view any new information and judge any new frameworks.
Quite separately I would find economics much more comprehensible if they included negatives throughout; as far as I can tell there is no conceptual motivation for avoiding them, it is mostly a matter of computational convenience. I would be happy to be wrong; if I could figure out the motivation for that, it would probably help me follow the logic better.
The bottom line is protection, expansion, and/or survival; destruction is only an intermediate goal